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Preface 


Objective of the Book 


The first edition of Basic Econometrics was published thirty years ago. Over the years, 
there have been important developments in the theory and practice of econometrics. In 
each of the subsequent editions, I have tried to incorporate the major developments in the 
field. The fifth edition continues that tradition. 

What has not changed, however, over all these years is my firm belief that econometrics 
can be taught to the beginner in an intuitive and informative way without resorting to 
matrix algebra, calculus, or statistics beyond the introductory level. Some subject material 
is inherently technical. In that case I have put the material in the appropriate appendix or 
refer the reader to the appropriate sources. Even then, I have tried to simplify the technical 
material so that the reader can get an intuitive understanding of this material. 

I am pleasantly surprised not only by the longevity of this book but also by the fact that 
the book is widely used not only by students of economics and finance but also by students 
and researchers in the fields of politics, international relations, agriculture, and health 
sciences. All these students will find the new edition with its expanded topics and concrete 
applications very useful. In this edition I have paid even more attention to the relevance and 
timeliness of the real data used in the text. In fact, I have added about fifteen new illustra¬ 
tive examples and more than thirty new end-of-chapter exercises. Also, I have updated 
the data for about two dozen of the previous edition’s examples and more than twenty 
exercises. 

Although I am in the eighth decade of my life, I have not lost my love for econometrics, 
and I strive to keep up with the major developments in the field. To assist me in this 
endeavor, I am now happy to have Dr. Dawn Porter, Assistant Professor of Statistics at the 
Marshall School of Business at the University of Southern California in Los Angeles, as 
my co-author. Both of us have been deeply involved in bringing the fifth edition of Basic 
Econometrics to fruition. 

Major Features of the Fifth Edition 


Before discussing the specific changes in the various chapters, the following features of the 

new edition are worth noting: 

1. Practically all of the data used in the illustrative examples have been updated. 

2. Several new examples have been added. 

3. In several chapters, we have included extended concluding examples that illustrate the 
various points made in the text. 

4. Concrete computer printouts of several examples are included in the book. Most of these 
results are based on EViews (version 6) and STATA (version 10), as well as MINITAB 
(version 15). 

5. Several new diagrams and graphs are included in various chapters. 

6 . Several new data-based exercises are included in the various chapters. 

7. Small-sized data are included in the book, but large sample data are posted on the book’s 
website, thereby minimizing the size of the text. The website will also publish all of the 
data used in the book and will be periodically updated. 
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8 . In a few chapters, we have included class exercises in which students are encouraged to 
obtain their own data and implement the various techniques discussed in the book. Some 
Monte Carlo simulations are also included in the book. 

Specific Changes to the Fifth Edition 


Some chapter-specific changes are as follows: 

1. The assumptions underlying the classical linear regression model (CLRM) introduced 
in Chapter 3 now make a careful distinction between fixed regressors (explanatory 
variables) and random regressors. We discuss the importance of the distinction. 

2. The appendix to Chapter 6 discusses the properties of logarithms, the Box-Cox trans¬ 
formations, and various growth formulas. 

3. Chapter 7 now discusses not only the marginal impact of a single regressor on the 
dependent variable but also the impacts of simultaneous changes of all the explanatory 
variables on the dependent variable. This chapter has also been reorganized in the same 
structure as the assumptions from Chapter 3. 

4. A comparison of the various tests of heteroscedasticity is given in Chapter 11. 

5. There is a new discussion of the impact of structural breaks on autocorrelation in 
Chapter 12. 

6 . New topics included in Chapter 13 are missing data, non-normal error term, and 
stochastic, or random, regressors. 

7. A non-linear regression model discussed in Chapter 14 has a concrete application of 
the Box-Cox transformation. 

8 . Chapter 15 contains several new examples that illustrate the use of logit and probit 
models in various fields. 

9. Chapter 16 on panel data regression models has been thoroughly revised and illus¬ 
trated with several applications. 

10. An extended discussion of Sims and Granger causality tests is now included in Chap¬ 
ter 17. 

11. Stationary and non-stationary time series, as well as some of the problems associated 
with various tests of stationarity, are now thoroughly discussed in Chapter 21. 

12. Chapter 22 includes a discussion on why taking the first differences of a time series 
for the purpose of making it stationary may not be the appropriate strategy in some 
situations. 

Besides these specific changes, errors and misprints in the previous editions have been cor¬ 
rected and the discussions of several topics in the various chapters have been streamlined. 

Organization and Options 


The extensive coverage in this edition gives the instructor substantial flexibility in choos¬ 
ing topics that are appropriate to the intended audience. Here are suggestions about how 
this book may be used. 

One-semester course for the nonspecialist: Appendix A, Chapters 1 through 9, an 
overview of Chapters 10, 11, 12 (omitting all the proofs). 

One-semester course for economics majors: Appendix A, Chapters 1 through 13. 
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Supplements 

Two-semester course for economics majors: Appendices A, B, C, Chapters 1 to 22. 
Chapters 14 and 16 may be covered on an optional basis. Some of the technical appen¬ 
dices may be omitted. 

Graduate and postgraduate students and researchers: This book is a handy refer¬ 
ence book on the major themes in econometrics. 


A comprehensive website contains the following supplementary material: 

-Data from the text, as well as additional large set data referenced in the book; the data 
will be periodically updated by the authors. 

-A Solutions Manual, written by Dawn Porter, providing answers to all of the 
questions and problems throughout the text. 

-A digital image library containing all of the graphs and figures from the text. 

For more information, please go to www.mhhe.com/gujarati5e 
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Introduction 


1.1 What Is Econometrics? 


Literally interpreted, econometrics means “economic measurement.” Although measure¬ 
ment is an important part of econometrics, the scope of econometrics is much broader, as 
can be seen from the following quotations: 

Econometrics, the result of a certain outlook on the role of economics, consists of the applica¬ 
tion of mathematical statistics to economic data to lend empirical support to the models 
constructed by mathematical economics and to obtain numerical results. 1 

.. . econometrics may be defined as the quantitative analysis of actual economic phenomena 
based on the concurrent development of theory and observation, related by appropriate 
methods of inference. 2 

Econometrics may be defined as the social science in which the tools of economic theory, 
mathematics, and statistical inference are applied to the analysis of economic phenomena. 3 

Econometrics is concerned with the empirical determination of economic laws. 4 

The art of the econometrician consists in finding the set of assumptions that are both suffi¬ 
ciently specific and sufficiently realistic to allow him to take the best possible advantage of the 
data available to him. 5 

Econometricians ... are a positive help in trying to dispel the poor public image of economics 
(quantitative or otherwise) as a subject in which empty boxes are opened by assuming the 
existence of can-openers to reveal contents which any ten economists will interpret in 
11 ways. 6 

The method of econometric research aims, essentially, at a conjunction of economic theory 
and actual measurements, using the theory and technique of statistical inference as a bridge 


Gerhard Tintner, Methodology of Mathematical Economics and Econometrics, The University of Chicago 
Press, Chicago, 1968, p. 74. 

2 P. A. Samuelson, T. C. Koopmans, and J. R. N. Stone, "Report of the Evaluative Committee for Econo¬ 
metrica," Econometrica, vol. 22, no. 2, April 1954, pp. 141-146. 

3 Arthur S. Coldberger, Econometric Theory, John Wiley & Sons, New York, 1964, p. 1. 

4 H. Theil, Principles of Econometrics, John Wiley & Sons, New York, 1971, p. 1. 

S E. Malinvaud, Statistical Methods of Econometrics, Rand McNally, Chicago, 1966, p. 514. 

6 Adrian C. Darnell and J. Lynne Evans, The Limits of Econometrics, Edward Elgar Publishing, Hants, 
England, 1990, p. 54. 

7 T. Haavelmo, "The Probability Approach in Econometrics," Supplement to Econometrica, vol. 12, 

1944, preface p. ii. 



2 Basic Econometrics 


1.2 Why a Separate Discipline? 

As the preceding definitions suggest, econometrics is an amalgam of economic theory, 
mathematical economics, economic statistics, and mathematical statistics. Yet the subject 
deserves to be studied in its own right for the following reasons. 

Economic theory makes statements or hypotheses that are mostly qualitative in nature. 
For example, microeconomic theory states that, other things remaining the same, a reduc¬ 
tion in the price of a commodity is expected to increase the quantity demanded of that com¬ 
modity. Thus, economic theory postulates a negative or inverse relationship between the 
price and quantity demanded of a commodity. But the theory itself does not provide any 
numerical measure of the relationship between the two; that is, it does not tell by how much 
the quantity will go up or down as a result of a certain change in the price of the commod¬ 
ity. It is the job of the econometrician to provide such numerical estimates. Stated differ¬ 
ently, econometrics gives empirical content to most economic theory. 

The main concern of mathematical economics is to express economic theory in mathe¬ 
matical form (equations) without regard to measurability or empirical verification of the 
theory. Econometrics, as noted previously, is mainly interested in the empirical verification 
of economic theory. As we shall see, the econometrician often uses the mathematical 
equations proposed by the mathematical economist but puts these equations in such a form 
that they lend themselves to empirical testing. And this conversion of mathematical into 
econometric equations requires a great deal of ingenuity and practical skill. 

Economic statistics is mainly concerned with collecting, processing, and presenting 
economic data in the form of charts and tables. These are the jobs of the economic statisti¬ 
cian. It is he or she who is primarily responsible for collecting data on gross national 
product (GNP), employment, unemployment, prices, and so on. The data thus collected 
constitute the raw data for econometric work. But the economic statistician does not go any 
further, not being concerned with using the collected data to test economic theories. Of 
course, one who does that becomes an econometrician. 

Although mathematical statistics provides many tools used in the trade, the econometri¬ 
cian often needs special methods in view of the unique nature of most economic data, 
namely, that the data are not generated as the result of a controlled experiment. The econo¬ 
metrician, like the meteorologist, generally depends on data that cannot be controlled 
directly. As Spanos correctly observes: 

In econometrics the modeler is often faced with observational as opposed to experimental 
data. This has two important implications for empirical modeling in econometrics. First, the 
modeler is required to master very different skills than those needed for analyzing experimen¬ 
tal data.. . . Second, the separation of the data collector and the data analyst requires the mod¬ 
eler to familiarize himself/herself thoroughly with the nature and structure of data in question. * * * * 8 


1.3 Methodology of Econometrics 

How do econometricians proceed in their analysis of an economic problem? That is, what 

is their methodology? Although there are several schools of thought on econometric 

methodology, we present here the traditional or classical methodology, which still domi¬ 

nates empirical research in economics and other social and behavioral sciences. 9 

8 Aris Spanos, Probability Theory and Statistical Inference: Econometric Modeling with Observational Data, 
Cambridge University Press, United Kingdom, 1999, p. 21. 

9 For an enlightening, if advanced, discussion on econometric methodology, see David F. Hendry, 
Dynamic Econometrics, Oxford University Press, New York, 1995. See also Aris Spanos, op. cit. 
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Broadly speaking, traditional econometric methodology proceeds along the following 
lines: 

1. Statement of theory or hypothesis. 

2. Specification of the mathematical model of the theory. 

3. Specification of the statistical, or econometric, model. 

4. Obtaining the data. 

5. Estimation of the parameters of the econometric model. 

6 . Hypothesis testing. 

7. Forecasting or prediction. 

8 . Using the model for control or policy purposes. 

To illustrate the preceding steps, let us consider the well-known Keynesian theory of 
consumption. 

1. Statement of Theory or Hypothesis 

Keynes stated: 

The fundamental psychological law ... is that men [women] are disposed, as a rule and on 
average, to increase their consumption as their income increases, but not as much as the 
increase in their income. 10 

In short, Keynes postulated that the marginal propensity to consume (MFC), the rate of 
change of consumption for a unit (say, a dollar) change in income, is greater than zero but 
less than 1. 

2. Specification of the Mathematical Model of Consumption 

Although Keynes postulated a positive relationship between consumption and income, 
he did not specify the precise form of the functional relationship between the two. For 
simplicity, a mathematical economist might suggest the following form of the Keynesian 
consumption function: 


Y = P\ +f) 2 X 0 < @2 < 1 (1-3.1) 

where Y = consumption expenditure and A = income, and where and /3 2 , known as the 
parameters of the model, are, respectively, the intercept and slope coefficients. 

The slope coefficient /3 2 measures the MPC. Geometrically, Equation 1.3.1 is as shown 
in Figure 1.1. This equation, which states that consumption is linearly related to income, is 
an example of a mathematical model of the relationship between consumption and income 
that is called the consumption function in economics. A model is simply a set of mathe¬ 
matical equations. If the model has only one equation, as in the preceding example, it is 
called a single-equation model, whereas if it has more than one equation, it is known as a 
multiple-equation model (the latter will be considered later in the book). 

In Eq. (1.3.1) the variable appearing on the left side of the equality sign is called the 
dependent variable and the variable(s) on the right side is called the independent, or 
explanatory, variable(s). Thus, in the Keynesian consumption function, Eq. (1.3.1), con¬ 
sumption (expenditure) is the dependent variable and income is the explanatory variable. 


10 John Maynard Keynes, The General Theory of Employment, Interest and Money, Harcourt Brace 
Jovanovich, New York, 1936, p. 96. 
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FIGURE 1.1 

Keynesian 

consumption function. 


Y 



3. Specification of the Econometric Model 
of Consumption 

The purely mathematical model of the consumption function given in Eq. (1.3.1) is of lim¬ 
ited interest to the econometrician, for it assumes that there is an exact or deterministic 
relationship between consumption and income. But relationships between economic vari¬ 
ables are generally inexact. Thus, if we were to obtain data on consumption expenditure and 
disposable (i.e., aftertax) income of a sample of, say, 500 American families and plot these 
data on a graph paper with consumption expenditure on the vertical axis and disposable in¬ 
come on the horizontal axis, we would not expect all 500 observations to lie exactly on the 
straight line of Eq. (1.3.1) because, in addition to income, other variables affect consump¬ 
tion expenditure. For example, size of family, ages of the members in the family, family 
religion, etc., are likely to exert some influence on consumption. 

To allow for the inexact relationships between economic variables, the econometrician 
would modify the deterministic consumption function in Eq. (1.3.1) as follows: 

Y = fr+hX+u (1.3.2) 

where u, known as the disturbance, or error, term, is a random (stochastic) variable that 
has well-defined probabilistic properties. The disturbance term u may well represent all 
those factors that affect consumption but are not taken into account explicitly. 

Equation 1.3.2 is an example of an econometric model. More technically, it is an exam¬ 
ple of a linear regression model, which is the major concern of this book. The economet¬ 
ric consumption function hypothesizes that the dependent variable Y (consumption) is 
linearly related to the explanatory variable X (income) but that the relationship between the 
two is not exact; it is subject to individual variation. 

The econometric model of the consumption function can be depicted as shown in 
Figure 1.2. 
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FIGURE 1.2 

Econometric model 
of the Keynesian 
consumption function. 


Y 



4. Obtaining Data 

To estimate the econometric model given in Eq. (1.3.2), that is, to obtain the numerical 
values of and Pi, we need data. Although we will have more to say about the crucial 
importance of data for economic analysis in the next chapter, for now let us look at the 
data given in Table 1.1, which relate to the U.S. economy for the period 1960-2005. The 
Y variable in this table is the aggregate (for the economy as a whole) personal consumption 
expenditure (PCE) and the X variable is gross domestic product (GDP), a measure of 
aggregate income, both measured in billions of 2000 dollars. Therefore, the data are in 
“real” terms; that is, they are measured in constant (2000) prices. The data are plotted 
in Figure 1.3 (cf. Figure 1.2). For the time being neglect the line drawn in the figure. 

5. Estimation of the Econometric Model 

Now that we have the data, our next task is to estimate the parameters of the consumption 
function. The numerical estimates of the parameters give empirical content to the con¬ 
sumption function. The actual mechanics of estimating the parameters will be discussed in 
Chapter 3. For now, note that the statistical technique of regression analysis is the main 
tool used to obtain the estimates. Using this technique and the data given in Table 1.1, we 
obtain the following estimates of and fc, namely, -299.5913 and 0.7218. Thus, the 
estimated consumption function is: 

Y, = -299.5913 + 0.7218X, (1.3.3) 

The hat on the Y indicates that it is an estimate. 11 The estimated consumption function (i.e., 
regression line) is shown in Figure 1.3. 


a matter of convention, a hat 


variable or parameter indicates that it is an estimated value. 
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TABLE 1.1 

Data on Y (Personal 
Consumption 
Expenditure) and 
X (Gross Domestic 
Product, 1960-2005), 
both in 2000 Billions 
of Dollars 


Source: Economic Report of 



Year 

PCE(Y) 

GDP(X) 

1960 

1597.4 

2501.8 

1961 

1630.3 

2560.0 

1962 

1711.1 

2715.2 

1963 

1781.6 

2834.0 

1964 

1888.4 

2998.6 

1965 

2007.7 

3191.1 

1966 

2121.8 

3399.1 

1967 

2185.0 

3484.6 

1968 

2310.5 

3652.7 

1969 

2396.4 

3765.4 

1970 

2451.9 

3771.9 

1971 

2545.5 

3898.6 

1972 

2701.3 

4105.0 

1973 

2833.8 

4341.5 

1974 

2812.3 

4319.6 

1975 

2876.9 

4311.2 

1976 

3035.5 

4540.9 

1977 

3164.1 

4750.5 

1978 

3303.1 

5015.0 

1979 

3383.4 

5173.4 

1980 

3374.1 

5161.7 

1981 

3422.2 

5291.7 

1982 

3470.3 

5189.3 

1983 

3668.6 

5423.8 

1984 

3863.3 

5813.6 

1985 

4064.0 

6053.7 

1986 

4228.9 

6263.6 

1987 

4369.8 

6475.1 

1988 

4546.9 

6742.7 

1989 

4675.0 

6981.4 

1990 

4770.3 

7112.5 

1991 

4778.4 

7100.5 

1992 

4934.8 

7336.6 

1993 

5099.8 

7532.7 

1994 

5290.7 

7835.5 

1995 

5433.5 

8031.7 

1996 

5619.4 

8328.9 

1997 

5831.8 

8703.5 

1998 

6125.8 

9066.9 

1999 

6438.6 

9470.3 

2000 

6739.4 

9817.0 

2001 

6910.4 

9890.7 

2002 

7099.3 

10048.8 

2003 

7295.3 

10301.0 

2004 

7577.1 

10703.5 

2005 

7841.2 

11048.6 
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FIGURE 1.3 

Personal consumption 
expenditure (7) in 
relation to GDP ( X ), 
1960-2005, in billions 
of 2000 dollars. 
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As Figure 1.3 shows, the regression line fits the data quite well in that the data points are 
very close to the regression line. From this figure we see that for the period 1960-2005 the 
slope coefficient (i.e., the MPC) was about 0.72, suggesting that for the sample period an 
increase in real income of one dollar led, on average, to an increase of about 72 cents in real 
consumption expenditure. 12 We say on average because the relationship between con¬ 
sumption and income is inexact; as is clear from Figure 1.3, not all the data points lie 
exactly on the regression line. In simple terms we can say that, according to our data, the 
average, or mean, consumption expenditure went up by about 72 cents for a dollar’s 
increase in real income. 

6. Hypothesis Testing 

Assuming that the fitted model is a reasonably good approximation of reality, we have to 
develop suitable criteria to find out whether the estimates obtained in, say, Equation 1.3.3 
are in accord with the expectations of the theory that is being tested. According to “posi¬ 
tive” economists like Milton Friedman, a theory or hypothesis that is not verifiable by 
appeal to empirical evidence may not be admissible as a part of scientific enquiry. 13 

As noted earlier, Keynes expected the MPC to be positive but less than 1. In our exam¬ 
ple we found the MPC to be about 0.72. But before we accept this finding as confirmation 
of Keynesian consumption theory, we must enquire whether this estimate is sufficiently 

12 Do not worry now about how these values were obtained. As we show in Chapter 3, the statistical 
method of least squares has produced these estimates. Also, for now do not worry about the 
negative value of the intercept. 

13 See Milton Friedman, "The Methodology of Positive Economics," Essays in Positive Economics, 
University of Chicago Press, Chicago, 1953. 
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below unity to convince us that this is not a chance occurrence or peculiarity of the partic¬ 
ular data we have used. In other words, is 0.72 statistically less than 1? If it is, it may sup¬ 
port Keynes’s theory. 

Such confirmation or refutation of economic theories on the basis of sample evidence is 
based on a branch of statistical theory known as statistical inference (hypothesis testing). 
Throughout this book we shall see how this inference process is actually conducted. 

7. Forecasting or Prediction 

If the chosen model does not refute the hypothesis or theory under consideration, we may 
use it to predict the future value(s) of the dependent, or forecast, variable Y on the basis of 
the known or expected future value(s) of the explanatory, or predictor, variable X. 

To illustrate, suppose we want to predict the mean consumption expenditure for 2006. 
The GDP value for 2006 was 11319.4 billion dollars. 14 Putting this GDP figure on the 
right-hand side of Eq. (1.3.3), we obtain: 

7 2 oo6 = -299.5913 + 0.7218(11319.4) 

(1.3.4) 

= 7870.7516 


or about 7870 billion dollars. Thus, given the value of the GDP, the mean, or average, fore¬ 
cast consumption expenditure is about 7870 billion dollars. The actual value of the con¬ 
sumption expenditure reported in 2006 was 8044 billion dollars. The estimated model 
Eq. (1.3.3) thus underpredicted the actual consumption expenditure by about 174 billion 
dollars. We could say the forecast error is about 174 billion dollars, which is about 
1.5 percent of the actual GDP value for 2006. When we fully discuss the linear regression 
model in subsequent chapters, we will try to find out if such an error is “small” or “large.” 
But what is important for now is to note that such forecast errors are inevitable given the 
statistical nature of our analysis. 

There is another use of the estimated model Eq. (1.3.3). Suppose the president decides 
to propose a reduction in the income tax. What will be the effect of such a policy on income 
and thereby on consumption expenditure and ultimately on employment? 

Suppose that, as a result of the proposed policy change, investment expenditure in¬ 
creases. What will be the effect on the economy? As macroeconomic theory shows, the 
change in income following, say, a dollar’s worth of change in investment expenditure is 
given by the income multiplier M, which is defined as 


If we use the MPC of 0.72 obtained in Eq. (1.3.3), this multiplier becomes about M — 3.57. 
That is, an increase (decrease) of a dollar in investment will eventually lead to more than a 
threefold increase (decrease) in income; note that it takes time for the multiplier to work. 

The critical value in this computation is MPC, for the multiplier depends on it. And this 
estimate of the MPC can be obtained from regression models such as Eq. (1.3.3). Thus, a 
quantitative estimate of MPC provides valuable information for policy purposes. Knowing 
MPC, one can predict the future course of income, consumption expenditure, and employ¬ 
ment following a change in the government’s fiscal policies. 


14 Data on PCE and GDP were available for 2006 but we purposely left them out to illustrate the topic 
discussed in this section. As we will discuss in subsequent chapters, it is a good idea to save a portion 
of the data to find out how well the fitted model predicts the out-of-sample observations. 
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8. Use of the Model for Control or Policy Purposes 

Suppose we have the estimated consumption function given in Eq. (1.3.3). Suppose further 
the government believes that consumer expenditure of about 8750 (billions of2000 dollars) 
will keep the unemployment rate at its current level of about 4.2 percent (early 2006). What 
level of income will guarantee the target amount of consumption expenditure? 

If the regression results given in Eq. (E3.3) seem reasonable, simple arithmetic will 
show that 


8750 = -299.5913 + 0.7218(GDP 2 oo6) (1-3.6) 

which gives X = 12537, approximately. That is, an income level of about 12537 (billion) 
dollars, given an MPC of about 0.72, will produce an expenditure of about 8750 billion 
dollars. 

As these calculations suggest, an estimated model may be used for control, or policy, 
purposes. By appropriate fiscal and monetary policy mix, the government can manipulate 
the control variable X to produce the desired level of the target variable Y. 

Figure 1.4 summarizes the anatomy of classical econometric modeling. 


Choosing among Competing Models 

When a governmental agency (e.g., the U.S. Department of Commerce) collects economic 
data, such as that shown in Table 1.1, it does not necessarily have any economic theory in 
mind. How then does one know that the data really support the Keynesian theory of con¬ 
sumption? Is it because the Keynesian consumption function (i.e., the regression line) 
shown in Figure 1.3 is extremely close to the actual data points? Is it possible that another 
consumption model (theory) might equally fit the data as well? For example, Milton 
Friedman has developed a model of consumption, called the permanent income 


FIGURE 1.4 

Anatomy of 
econometric modeling. 
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hypothesis , 15 Robert Hall has also developed a model of consumption, called the life-cycle 
permanent income hypothesis , 16 Could one or both of these models also fit the data in 
Table LI? 

In short, the question facing a researcher in practice is how to choose among competing 
hypotheses or models of a given phenomenon, such as the consumption-income relation¬ 
ship. As Miller contends: 

No encounter with data is [a] step towards genuine confirmation unless the hypothesis does a 

better job of coping with the data than some natural rival... . What strengthens a hypothesis, 

here, is a victory that is, at the same time, a defeat for a plausible rival. 17 

How then does one choose among competing models or hypotheses? Here the advice given 
by Clive Granger is worth keeping in mind: 18 

I would like to suggest that in the future, when you are presented with a new piece of theory or 

empirical model, you ask these questions: 

(i) What purpose does it have? What economic decisions does it help with? 

(ii) Is there any evidence being presented that allows me to evaluate its quality compared to 
alternative theories or models? 

I think attention to such questions will strengthen economic research and discussion. 

As we progress through this book, we will come across several competing hypotheses 
trying to explain various economic phenomena. For example, students of economics are 
familiar with the concept of the production function, which is basically a relationship 
between output and inputs (say, capital and labor). In the literature, two of the best known 
are the Cobb-Douglas and the constant elasticity of substitution production functions. 
Given the data on output and inputs, we will have to find out which of the two production 
functions, if any, fits the data well. 

The eight-step classical econometric methodology discussed above is neutral in the 
sense that it can be used to test any of these rival hypotheses. 

Is it possible to develop a methodology that is comprehensive enough to include 
competing hypotheses? This is an involved and controversial topic. We will discuss it in 
Chapter 13, after we have acquired the necessary econometric theory. 


1.4 Types of Econometrics 

As the classificatory scheme in Figure 1.5 suggests, econometrics may be divided into two 
broad categories: theoretical econometrics and applied econometrics. In each category, 
one can approach the subject in the classical or Bayesian tradition. In this book the 
emphasis is on the classical approach. For the Bayesian approach, the reader may consult 
the references given at the end of the chapter. 


15 Milton Friedman, A Theory of Consumption Function, Princeton University Press, Princeton, N.J., 
1957. 

16 R. Hall, "Stochastic Implications of the Life Cycle Permanent Income Hypothesis: Theory and 
Evidence," Journal of Political Economy, vol. 86, 1978, pp. 971-987. 

17 R. W. Miller, Fact and Method: Explanation, Confirmation, and Reality in the Natural and Social 
Sciences, Princeton University Press, Princeton, N.J., 1978, p. 176. 

18 Clive W. J. Granger, Empirical Modeling in Economics, Cambridge University Press, U.K., 1999, p. 58. 
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FIGURE 1.5 

Categories of 
econometrics. 


Theoretical econometrics is concerned with the development of appropriate methods for 
measuring economic relationships specified by econometric models. In this aspect, econo¬ 
metrics leans heavily on mathematical statistics. For example, one of the methods used 
extensively in this book is least squares. Theoretical econometrics must spell out the 
assumptions of this method, its properties, and what happens to these properties when one 
or more of the assumptions of the method are not fulfilled. 

In applied econometrics we use the tools of theoretical econometrics to study some 
special field(s) of economics and business, such as the production function, investment 
function, demand and supply functions, portfolio theory, etc. 

This book is concerned largely with the development of econometric methods, their 
assumptions, their uses, and their limitations. These methods are illustrated with examples 
from various areas of economics and business. But this is not a book of applied economet¬ 
rics in the sense that it delves deeply into any particular field of economic application. That 
job is best left to books written specifically for this purpose. References to some of these 
books are provided at the end of this book. 


Applied 

f-- 1 - } 


Theoretical 
1- 1 -1 


1.5 Mathematical and Statistical Prerequisites 

Although this book is written at an elementary level, the author assumes that the reader is 
familiar with the basic concepts of statistical estimation and hypothesis testing. However, a 
broad but nontechnical overview of the basic statistical concepts used in this book is pro¬ 
vided in Appendix A for the benefit of those who want to refresh their knowledge. Insofar 
as mathematics is concerned, a nodding acquaintance with the notions of differential 
calculus is desirable, although not essential. Although most graduate level books in econo¬ 
metrics make heavy use of matrix algebra, I want to make it clear that it is not needed to 
study this book. It is my strong belief that the fundamental ideas of econometrics can be 
conveyed without the use of matrix algebra. However, for the benefit of the mathematically 
inclined student, Appendix C gives the summary of basic regression theory in matrix 
notation. For these students, Appendix B provides a succinct summary of the main results 
from matrix algebra. 


1.6 The Role of the Computer 

Regression analysis, the bread-and-butter tool of econometrics, these days is unthinkable 
without the computer and some access to statistical software. (Believe me, I grew up in the 
generation of the slide rule!) Fortunately, several excellent regression packages are com¬ 
mercially available, both for the mainframe and the microcomputer, and the list is growing 
by the day. Regression software packages, such as ET, LIMDEP, SHAZAM, MICRO 
TSP, MINITAB, EVIEWS, SAS, SPSS, STATA, Microfit, PcGive, and BMD have most 
of the econometric techniques and tests discussed in this book. 
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In this book, from time to time, the reader will be asked to conduct Monte Carlo 
experiments using one or more of the statistical packages. Monte Carlo experiments are 
“fun” exercises that will enable the reader to appreciate the properties of several statistical 
methods discussed in this book. The details of the Monte Carlo experiments will be 
discussed at appropriate places. 


1.7 Suggestions for Further Reading 

The topic of econometric methodology is vast and controversial. For those interested in this 
topic, I suggest the following books: 

Neil de Marchi and Christopher Gilbert, eds., History and Methodology of Economet¬ 
rics, Oxford University Press, New York, 1989. This collection of readings discusses some 
early work on econometric methodology and has an extended discussion of the British 
approach to econometrics relating to time series data, that is, data collected over a period 
of time. 

Wojciech W. Charemza and Derek F. Deadman, New Directions in Econometric 
Practice: General to Specific Modelling, Cointegration and Vector Autogression, 2d ed., 
Edward Elgar Publishing Ltd., Hants, England, 1997. The authors of this book critique the 
traditional approach to econometrics and give a detailed exposition of new approaches to 
econometric methodology. 

Adrian C. Darnell and J. Lynne Evans, The Limits of Econometrics, Edward Elgar 
Publishing Ltd., Hants, England, 1990. The book provides a somewhat balanced discussion 
of the various methodological approaches to econometrics, with renewed allegiance to 
traditional econometric methodology. 

Mary S. Morgan, The History of Econometric Ideas, Cambridge University Press, New 
York, 1990. The author provides an excellent historical perspective on the theory and prac¬ 
tice of econometrics, with an in-depth discussion of the early contributions of Haavelmo 
(1990 Nobel Laureate in Economics) to econometrics. In the same spirit, David F. Hendry 
and Mary S. Morgan, The Foundation of Econometric Analysis, Cambridge University 
Press, U.K., 1995, have collected seminal writings in econometrics to show the evolution of 
econometric ideas over time. 

David Colander and Reuven Brenner, eds., Educating Economists, University of 
Michigan Press, Ann Arbor, Michigan, 1992. This text presents a critical, at times agnostic, 
view of economic teaching and practice. 

For Bayesian statistics and econometrics, the following books are very useful: John H. 
Dey, Data in Doubt, Basil Blackwell Ltd., Oxford University Press, England, 1985; Peter 
M. Lee, Bayesian Statistics: An Introduction, Oxford University Press, England, 1989; and 
Dale J. Porier, Intermediate Statistics and Econometrics: A Comparative Approach, MIT 
Press, Cambridge, Massachusetts, 1995. Arnold Zeller, An Introduction to Bayesian Infer¬ 
ence in Econometrics, John Wiley & Sons, New York, 1971, is an advanced reference book. 
Another advanced reference book is the Palgrave Handbook of Econometrics: Volume 1: 
Econometric Theory, edited by Terence C. Mills and Kerry Patterson, Palgrave Macmillan, 
New York, 2007. 



Single-Equation 
Regression Models 


Part 


i 


Part 1 of this text introduces single-equation regression models. In these models, one 
variable, called the dependent variable, is expressed as a linear function of one or more 
other variables, called the explanatory variables. In such models it is assumed implicitly 
that causal relationships, if any, between the dependent and explanatory variables flow in 
one direction only, namely, from the explanatory variables to the dependent variable. 

In Chapter 1, we discuss the historical as well as the modern interpretation of the term 
regression and illustrate the difference between the two interpretations with several exam¬ 
ples drawn from economics and other fields. 

In Chapter 2, we introduce some fundamental concepts of regression analysis with the 
aid of the two-variable linear regression model, a model in which the dependent variable is 
expressed as a linear function of only a single explanatory variable. 

In Chapter 3, we continue to deal with the two-variable model and introduce what is 
known as the classical linear regression model, a model that makes several simplifying 
assumptions. With these assumptions, we introduce the method of ordinary least squares 
(OLS) to estimate the parameters of the two-variable regression model. The method of OLS 
is simple to apply, yet it has some very desirable statistical properties. 

In Chapter 4, we introduce the (two-variable) classical normal linear regression model, 
a model that assumes that the random dependent variable follows the normal probability 
distribution. With this assumption, the OLS estimators obtained in Chapter 3 possess 
some stronger statistical properties than the nonnormal classical linear regression model— 
properties that enable us to engage in statistical inference, namely, hypothesis testing. 

Chapter 5 is devoted to the topic of hypothesis testing. In this chapter, we try to find out 
whether the estimated regression coefficients are compatible with the hypothesized values 
of such coefficients, the hypothesized values being suggested by theory and/or prior 
empirical work. 

Chapter 6 considers some extensions of the two-variable regression model. In particu¬ 
lar, it discusses topics such as (1) regression through the origin, (2) scaling and units of 
measurement, and (3) functional forms of regression models such as double-log, semilog, 
and reciprocal models. 
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In Chapter 7, we consider the multiple regression model, a model in which there is 
more than one explanatory variable, and show how the method of OLS can be extended to 
estimate the parameters of such models. 

In Chapter 8, we extend the concepts introduced in Chapter 5 to the multiple regression 
model and point out some of the complications arising from the introduction of several 
explanatory variables. 

Chapter 9 on dummy, or qualitative, explanatory variables concludes Part 1 of the text. 
This chapter emphasizes that not all explanatory variables need to be quantitative (i.e., ratio 
scale). Variables, such as gender, race, religion, nationality, and region of residence, can¬ 
not be readily quantified, yet they play a valuable role in explaining many an economic 
phenomenon. 


Chapter 


The Nature of 
Regression Analysis 

As mentioned in the Introduction, regression is a main tool of econometrics, and in this 
chapter we consider very briefly the nature of this tool. 


1.1 Historical Origin of the Term Regression 

The term regression was introduced by Francis Galton. In a famous paper, Galton found 
that, although there was a tendency for tall parents to have tall children and for short par¬ 
ents to have short children, the average height of children bom of parents of a given height 
tended to move or “regress” toward the average height in the population as a whole. 1 In 
other words, the height of the children of unusually tall or unusually short parents tends to 
move toward the average height of the population. Gabon’s law of universal regression was 
confirmed by his friend Karl Pearson, who collected more than a thousand records of 
heights of members of family groups. 2 He found that the average height of sons of a group 
of tall fathers was less than their fathers’ height and the average height of sons of a group 
of short fathers was greater than their fathers’ height, thus “regressing” tall and short sons 
alike toward the average height of all men. In the words of Galton, this was “regression to 
mediocrity.” 


1.2 The Modern Interpretation of Regression 

The modern interpretation of regression is, however, quite different. Broadly speaking, we 
may say 

Regression analysis is concerned with the study of the dependence of one variable, the 
dependent variable, on one or more other variables, the explanatory variables, with a view to 
estimating and/or predicting the (population) mean or average value of the former in terms of 
the known or fixed (in repeated sampling) values of the latter. 


Francis Galton, "Family Likeness in Stature," Proceedings of Royal Society, London, vol. 40, 1886, 
pp. 42-72. 

2 K. Pearson and A. Lee, "On the Laws of Inheritance," Biometrika, vol. 2, Nov. 1903, pp. 357-462. 
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The full import of this view of regression analysis will become clearer as we progress, but 
a few simple examples will make the basic concept quite clear. 

Examples 

1. Reconsider Gabon’s law of universal regression. Gabon was interested in finding out 
why there was a stability in the distribution of heights in a population. But in the modem 
view our concern is not with this explanation but rather with finding out how the average 
height of sons changes, given the fathers’ height. In other words, our concern is with pre¬ 
dicting the average height of sons knowing the height of their fathers. To see how this can 
be done, consider Figure 1.1, which is a scatter diagram, or scattergram. This figure 
shows the distribution of heights of sons in a hypothetical population corresponding to the 
given or fixed values of the father’s height. Notice that corresponding to any given height of 
a father is a range or distribution of the heights of the sons. However, notice that despite the 
variability of the height of sons for a given value of father’s height, the average height of 
sons generally increases as the height of the father increases. To show this clearly, the cir¬ 
cled crosses in the figure indicate the average height of sons corresponding to a given 
height of the father. Connecting these averages, we obtain the line shown in the figure. This 
line, as we shall see, is known as the regression line. It shows how the average height of 
sons increases with the father’s height. 3 

2. Consider the scattergram in Figure 1.2, which gives the distribution in a hypothetical 
population of heights of boys measured at fixed ages. Corresponding to any given age, we 
have a range, or distribution, of heights. Obviously, not all boys of a given age are likely to 
have identical heights. But height on the average increases with age (of course, up to a 


FIGURE 1.1 

Hypothetical 
distribution of sons’ 
heights corresponding 
to given heights of 
fathers. 


60 65 70 75 
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3 At this stage of the development of the subject matter, we shall call this regression line simply the 
line connecting the mean, or average, value of the dependent variable (son's height) corresponding to 
the given value of the explanatory variable (father's height). Note that this line has a positive slope but 
the slope is less than 1, which is in conformity with Gabon's regression to mediocrity. (Why?) 
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FIGURE 1.2 
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certain age), which can be seen clearly if we draw a line (the regression line) through the cir¬ 
cled points that represent the average height at the given ages. Thus, knowing the age, we 
may be able to predict from the regression line the average height corresponding to that age. 

3. Turning to economic examples, an economist may be interested in studying the de¬ 
pendence of personal consumption expenditure on aftertax or disposable real personal in¬ 
come. Such an analysis may be helpful in estimating the marginal propensity to consume 
(MPC), that is, average change in consumption expenditure for, say, a dollar’s worth of 
change in real income (see Figure 1.3). 

4. A monopolist who can fix the price or output (but not both) may want to find out 
the response of the demand for a product to changes in price. Such an experiment may 
enable the estimation of the price elasticity (i.e., price responsiveness) of the demand for the 
product and may help determine the most profitable price. 

5. A labor economist may want to study the rate of change of money wages in relation to 
the unemployment rate. The historical data are shown in the scattergram given in Figure 1.3. 
The curve in Figure 1.3 is an example of the celebrated Phillips curve relating changes in the 
money wages to the unemployment rate. Such a scattergram may enable the labor economist 
to predict the average change in money wages given a certain unemployment rate. Such 
knowledge may be helpful in stating something about the inflationary process in an econ¬ 
omy, for increases in money wages are likely to be reflected in increased prices. 

6. From monetary economics it is known that, other things remaining the same, the 
higher the rate of inflation re, the lower the proportion k of their income that people would 
want to hold in the form of money, as depicted in Figure 1.4. The slope of this line repre¬ 
sents the change in k given a change in the inflation rate. A quantitative analysis of this 
relationship will enable the monetary economist to predict the amount of money, as a 
proportion of their income, that people would want to hold at various rates of inflation. 

7. The marketing director of a company may want to know how the demand for the 
company’s product is related to, say, advertising expenditure. Such a study will be of 
considerable help in finding out the elasticity of demand with respect to advertising ex¬ 
penditure, that is, the percent change in demand in response to, say, a 1 percent change in 
the advertising budget. This knowledge may be helpful in determining the “optimum” 
advertising budget. 
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8. Finally, an agronomist may be interested in studying the dependence of a particular 
crop yield, say, of wheat, on temperature, rainfall, amount of sunshine, and fertilizer. Such 
a dependence analysis may enable the prediction or forecasting of the average crop yield, 
given information about the explanatory variables. 

The reader can supply scores of such examples of the dependence of one variable on one 
or more other variables. The techniques of regression analysis discussed in this text are 
specially designed to study such dependence among variables. 
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1.3 Statistical versus Deterministic Relationships 

From the examples cited in Section 1.2, the reader will notice that in regression analysis 
we are concerned with what is known as the statistical, not functional or deterministic, 
dependence among variables, such as those of classical physics. In statistical relation¬ 
ships among variables we essentially deal with random or stochastic * * * 4 variables, that is, 
variables that have probability distributions. In functional or deterministic dependency, 
on the other hand, we also deal with variables, but these variables are not random or 
stochastic. 

The dependence of crop yield on temperature, rainfall, sunshine, and fertilizer, for 
example, is statistical in nature in the sense that the explanatory variables, although 
certainly important, will not enable the agronomist to predict crop yield exactly because of 
errors involved in measuring these variables as well as a host of other factors (variables) 
that collectively affect the yield but may be difficult to identify individually. Thus, there is 
bound to be some “intrinsic” or random variability in the dependent-variable crop yield that 
cannot be fully explained no matter how many explanatory variables we consider. 

In deterministic phenomena, on the other hand, we deal with relationships of the type, 
say, exhibited by Newton’s law of gravity, which states: Every particle in the universe 
attracts every other particle with a force directly proportional to the product of their masses 
and inversely proportional to the square of the distance between them. Symbolically, 
F — k(m\m2/r 2 ), where F = force, m\ and m2 are the masses of the two particles, r = 
distance, and k— constant of proportionality. Another example is Ohm’s law, which states: 
For metallic conductors over a limited range of temperature the current C is proportional to 
the voltage V; that is, C = (|) V where \ is the constant of proportionality. Other examples 
of such deterministic relationships are Boyle’s gas law, Kirchhoff’s law of electricity, and 
Newton’s law of motion. 

In this text we are not concerned with such deterministic relationships. Of course, if 
there are errors of measurement, say, in the k of Newton’s law of gravity, the otherwise 
deterministic relationship becomes a statistical relationship. In this situation, force can be 
predicted only approximately from the given value of k (and mi, m2, and r), which contains 
errors. The variable F in this case becomes a random variable. 


1.4 Regression versus Causation 


Although regression analysis deals with the dependence of one variable on other variables, 
it does not necessarily imply causation. In the words of Kendall and Stuart, “A statistical 

relationship, however strong and however suggestive, can never establish causal connec¬ 

tion: our ideas of causation must come from outside statistics, ultimately from some theory 
or other.” 5 


4 The word stochastic comes from the Creek word stokhos meaning "a bull's eye." The outcome of 
throwing darts on a dart board is a stochastic process, that is, a process fraught with misses. 

5 M. G. Kendall and A. Stuart, The Advanced Theory of Statistics, Charles Griffin Publishers, New York, 
vol. 2, 1961, chap. 26, p. 279. 
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In the crop-yield example cited previously, there is no statistical reason to assume that 
rainfall does not depend on crop yield. The fact that we treat crop yield as dependent on 
rainfall (among other things) is due to nonstatistical considerations: Common sense 
suggests that the relationship cannot be reversed, for we cannot control rainfall by varying 
crop yield. 

In all the examples cited in Section 1.2 the point to note is that a statistical relationship 
in itself cannot logically imply causation. To ascribe causality, one must appeal to a priori 
or theoretical considerations. Thus, in the third example cited, one can invoke economic 
theory in saying that consumption expenditure depends on real income. * * * * * 6 


1.5 Regression versus Correlation 

Closely related to but conceptually very much different from regression analysis is 

correlation analysis, where the primary objective is to measure the strength or degree of 

linear association between two variables. The correlation coefficient, which we shall 
study in detail in Chapter 3, measures this strength of (linear) association. For example, we 
may be interested in finding the correlation (coefficient) between smoking and lung cancer, 
between scores on statistics and mathematics examinations, between high school grades 

and college grades, and so on. In regression analysis, as already noted, we are not primar¬ 
ily interested in such a measure. Instead, we try to estimate or predict the average value of 
one variable on the basis of the fixed values of other variables. Thus, we may want to know 
whether we can predict the average score on a statistics examination by knowing a student’s 
score on a mathematics examination. 

Regression and correlation have some fundamental differences that are worth mention¬ 
ing. In regression analysis there is an asymmetry in the way the dependent and explanatory 
variables are treated. The dependent variable is assumed to be statistical, random, or sto¬ 
chastic, that is, to have a probability distribution. The explanatory variables, on the other 
hand, are assumed to have fixed values (in repeated sampling), 7 which was made explicit in 
the definition of regression given in Section 1.2. Thus, in Figure 1.2 we assumed that the 
variable age was fixed at given levels and height measurements were obtained at these 
levels. In correlation analysis, on the other hand, we treat any (two) variables symmetri¬ 
cally; there is no distinction between the dependent and explanatory variables. After all, the 
correlation between scores on mathematics and statistics examinations is the same as that 
between scores on statistics and mathematics examinations. Moreover, both variables 
are assumed to be random. As we shall see, most of the correlation theory is based on the 
assumption of randomness of variables, whereas most of the regression theory to be 
expounded in this book is conditional upon the assumption that the dependent variable is 
stochastic but the explanatory variables are fixed or nonstochastic. 8 


6 But as we shall see in Chapter 3, classical regression analysis is based on the assumption that the 
model used in the analysis is the correct model. Therefore, the direction of causality may be implicit 
in the model postulated. 

7 lt is crucial to note that the explanatory variables may be intrinsically stochastic, but for the purpose 
of regression analysis we assume that their values are fixed in repeated sampling (that is, X assumes 
the same values in various samples), thus rendering them in effect nonrandom or nonstochastic. But 
more on this in Chapter 3, Sec. 3.2. 

8 ln advanced treatment of econometrics, one can relax the assumption that the explanatory variables 
are nonstochastic (see introduction to Part 2). 
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Before we proceed to a formal analysis of regression theory, let us dwell briefly on the 
matter of terminology and notation. In the literature the terms dependent variable and 
explanatory variable are described variously. A representative list is: 


Dependent variable 

♦ 

Explained variable 

♦ 

Predictand 

Regressand 

Response 

♦ 

Endogenous 

♦ 

Outcome 

# 

Controlled variable 


Explanatory variable 

$ 

Independent variable 

$ 

Predictor 

♦ 

Regressor 

$ 

Stimulus 

$ 

Exogenous 

£ 

Covariate 

♦ 

Control variable 


Although it is a matter of personal taste and tradition, in this text we will use the dependent 
variable/explanatory variable or the more neutral regressand and regressor terminology. 

If we are studying the dependence of a variable on only a single explanatory variable, 
such as that of consumption expenditure on real income, such a study is known as simple, 
or two-variable, regression analysis. However, if we are studying the dependence of one 
variable on more than one explanatory variable, as in the crop-yield, rainfall, temperature, 
sunshine, and fertilizer example, it is known as multiple regression analysis. In other 
words, in two-variable regression there is only one explanatory variable, whereas in multi¬ 
ple regression there is more than one explanatory variable. 

The term random is a synonym for the term stochastic. As noted earlier, a random or 
stochastic variable is a variable that can take on any set of values, positive or negative, with 
a given probability. 9 

Unless stated otherwise, the letter Y will denote the dependent variable and the Xs 
(X\, X2 ,..., Xk) will denote the explanatory variables, Xk being the Mi explanatory 
variable. The subscript i or t will denote the z'th or the fth observation or value. Xu (or Xu) 
will denote the z'th (or fth) observation on variable Xk. N (or T) will denote the total 
number of observations or values in the population, and n (or t ) the total number of obser¬ 
vations in a sample. As a matter of convention, the observation subscript i will be used for 
cross-sectional data (i.e., data collected at one point in time) and the subscript t will be 
used for time series data (i.e., data collected over a period of time). The nature of cross- 
sectional and time series data, as well as the important topic of the nature and sources of 
data for empirical analysis, is discussed in the following section. 


p See Appendix A for formal definition and further details. 
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1.7 The Nature and Sources of Data for Economic Analysis 10 

The success of any econometric analysis ultimately depends on the availability of the 
appropriate data. It is therefore essential that we spend some time discussing the nature, 
sources, and limitations of the data that one may encounter in empirical analysis. 

Types of Data 

Three types of data may be available for empirical analysis: time series, cross-section, and 
pooled (i.e., combination of time series and cross-section) data. 

Time Series Data 

The data shown in Table 1.1 of the Introduction are an example of time series data. A time 
series is a set of observations on the values that a variable takes at different times. Such data 
may be collected at regular time intervals, such as daily (e.g., stock prices, weather 
reports), weekly (e.g., money supply figures), monthly (e.g., the unemployment rate, the 
Consumer Price Index [CPI]), quarterly (e.g., GDP), annually (e.g., government 
budgets), quinquennially, that is, every 5 years (e.g., the census of manufactures), or 
decennially, that is, every 10 years (e.g., the census of population). Sometime data are 
available both quarterly as well as annually, as in the case of the data on GDP and consumer 
expenditure. With the advent of high-speed computers, data can now be collected over an 
extremely short interval of time, such as the data on stock prices, which can be obtained 
literally continuously (the so-called real-time quote). 

Although time series data are used heavily in econometric studies, they present special 
problems for econometricians. As we will show in chapters on time series econometrics 
later on, most empirical work based on time series data assumes that the underlying time 
series is stationary. Although it is too early to introduce the precise technical meaning of 
stationarity at this juncture, loosely speaking, a time series is stationary if its mean and 
variance do not vary systematically over time. To see what this means, consider Figure 1.5, 
which depicts the behavior of the Ml money supply in the United States from January 1, 
1959, to September, 1999. (The actual data are given in Exercise 1.4.) As you can see from 
this figure, the Ml money supply shows a steady upward trend as well as variability over 
the years, suggesting that the Ml time series is not stationary. 11 We will explore this topic 
fully in Chapter 21. 

Cross-Section Data 

Cross-section data are data on one or more variables collected at the same point in time, 
such as the census of population conducted by the Census Bureau every 10 years (the lat¬ 
est being in year 2000), the surveys of consumer expenditures conducted by the University 
of Michigan, and, of course, the opinion polls by Gallup and umpteen other organizations. 
A concrete example of cross-sectional data is given in Table 1.1. This table gives data on 
egg production and egg prices for the 50 states in the union for 1990 and 1991. For each 


10 Foran informative account, see Michael D. Intriligator, Econometric Models, Techniques, and 
Applications, Prentice Hall, Englewood Cliffs, N.J., 1978, chap. 3. 

"To see this more clearly, we divided the data into four time periods: 1951:01 to 1962:12; 1963:01 
to 1974:12; 1975:01 to 1986:12, and 1987:01 to 1999:09: For these subperiods the mean values of 
the money supply (with corresponding standard deviations in parentheses) were, respectively, 165.88 
(23.27), 323.20 (72.66), 788.12 (195.43), and 1099 (27.84), all figures in billions of dollars. This is a 
rough indication of the fact that the money supply over the entire period was not stationary. 
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FIGURE 1.5 

Ml money supply: 
United States, 
1951:01-1999:09. 



year the data on the 50 states are cross-sectional data. Thus, in Table 1.1 we have two cross- 
sectional samples. 

Just as time series data create their own special problems (because of the stationarity 
issue), cross-sectional data too have their own problems, specifically the problem of hetero¬ 
geneity. From the data given in Table 1.1 we see that we have some states that produce huge 
amounts of eggs (e.g., Pennsylvania) and some that produce very little (e.g., Alaska). When 
we include such heterogeneous units in a statistical analysis, the size or scale effect must be 
taken into account so as not to mix apples with oranges. To see this clearly, we plot in Fig¬ 
ure 1.6 the data on eggs produced and their prices in 50 states for the year 1990. This figure 
shows how widely scattered the observations are. In Chapter 11 we will see how the scale 
effect can be an important factor in assessing relationships among economic variables. 

Pooled Data 

In pooled, or combined, data are elements of both time series and cross-section data. The 
data in Table 1.1 are an example of pooled data. For each year we have 50 cross-sectional 
observations and for each state we have two time series observations on prices and output 
of eggs, a total of 100 pooled (or combined) observations. Likewise, the data given in 
Exercise 1.1 are pooled data in that the Consumer Price Index (CPI) for each country 
for 1980-2005 is time series data, whereas the data on the CPI for the seven countries 
for a single year are cross-sectional data. In the pooled data we have 182 observations— 
26 annual observations for each of the seven countries. 

Panel, Longitudinal, or Micropanel Data 

This is a special type of pooled data in which the same cross-sectional emit (say, a family or 
a firm) is surveyed over time. For example, the U.S. Department of Commerce carries out 
a census of housing at periodic intervals. At each periodic survey the same household 
(or the people living at the same address) is interviewed to find out if there has been any 
change in the housing and financial conditions of that household since the last survey. By 
interviewing the same household periodically, the panel data provide very useful informa¬ 
tion on the dynamics of household behavior, as we shall see in Chapter 16. 
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FIGURE 1.6 
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TABLE 1.1 U.S. Egg Production 


State 

Y^ 

y 2 

*1 

*2 

State 

Yi 

y 2 


*2 

AL 

2,206 

2,186 

92.7 

91.4 

MT 

172 

164 

68.0 

66.0 

AK 

0.7 

0.7 

151.0 

149.0 

NE 

1,202 

1,400 

50.3 

48.9 

AZ 

73 

74 

61.0 

56.0 

NV 

2.2 

1.8 

53.9 

52.7 

AR 

3,620 

3,737 

86.3 

91.8 

NH 

43 

49 

109.0 

104.0 

CA 

7,472 

7,444 

63.4 

58.4 

Nj 

442 

491 

85.0 

83.0 

CO 

788 

873 

77.8 

73.0 

NM 

283 

302 

74.0 

70.0 

CT 

1,029 

948 

106.0 

104.0 

NY 

975 

987 

68.1 

64.0 

DE 

168 

164 

117.0 

113.0 

NC 

3,033 

3,045 

82.8 

78.7 

FL 

2,586 

2,537 

62.0 

57.2 

ND 

51 

45 

55.2 

48.0 

GA 

4,302 

4,301 

80.6 

80.8 

OH 

4,667 

4,637 

59.1 

54.7 

HI 

227.5 

224.5 

85.0 

85.5 

OK 

869 

830 

101.0 

100.0 

ID 

187 

203 

79.1 

72.9 

OR 

652 

686 

77.0 

74.6 

IL 

793 

809 

65.0 

70.5 

PA 

4,976 

5,130 

61.0 

52.0 

IN 

5,445 

5,290 

62.7 

60.1 

Rl 

53 

50 

102.0 

99.0 

IA 

2,151 

2,247 

56.5 

53.0 

SC 

1,422 

1,420 

70.1 

65.9 

KS 

404 

389 

54.5 

47.8 

SD 

435 

602 

48.0 

45.8 

KY 

412 

483 

67.7 

73.5 

TN 

277 

279 

71.0 

80.7 

LA 

273 

254 

115.0 

115.0 

TX 

3,317 

3,356 

76.7 

72.6 

ME 

1,069 

1,070 

101.0 

97.0 

UT 

456 

486 

64.0 

59.0 

MD 

885 

898 

76.6 

75.4 

VT 

31 

30 

106.0 

102.0 

MA 

235 

237 

105.0 

102.0 

VA 

943 

988 

86.3 

81.2 

Ml 

1,406 

1,396 

58.0 

53.8 

WA 

1,287 

1,313 

74.1 

71.5 

MN 

2,499 

2,697 

57.7 

54.0 

WV 

136 

174 

104.0 

109.0 

MS 

1,434 

1,468 

87.8 

86.7 

Wl 

910 

873 

60.1 

54.0 

MO 

1,580 

1,622 

55.4 

51.5 

WY 

1.7 

1.7 

83.0 

83.0 


Note: Y, = eggs produced in 1990 (millions). 

Yz = eggs produced in 1991 (millions). 

Xi = price per dozen (cents) in 1990. 

X 2 = price per dozen (cents) in 1991. 

Source: World Almanac, 1993, p. 119. The data are from the Economic Research Service, U.S. Department of Agriculture. 
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As a concrete example, consider the data given in Table 1.2. The data in the table, orig¬ 
inally collected by Y. Grunfeld, refer to the real investment, the real value of the firm, and 
the real capital stock of four U.S. companies, namely, General Electric (GM), U.S. Steel 
(US), General Motors (GM), and Westinghouse (WEST), for the period 1935-1954. 12 
Since the data are for several companies collected over a number of years, this is a classic 
example of panel data. In this table, the number of observations for each company is the 
same, but this is not always the case. If all the companies have the same number of obser¬ 
vations, we have what is called a balanced panel. If the number of observations is not the 
same for each company, it is called an unbalanced panel. In Chapter 16, Panel Data 
Regression Models, we will examine such data and show how to estimate such models. 

Grunfeld’s purpose in collecting these data was to find out how real gross investment (/) 
depends on the real value of the firm (F) a year earlier and real capital stock (C) a year 
earlier. Since the companies included in the sample operate in the same capital market, by 
studying them together, Grunfeld wanted to find out if they had similar investment functions. 

The Sources of Data 13 

The data used in empirical analysis may be collected by a governmental agency (e.g., the 
Department of Commerce), an international agency (e.g., the International Monetary Fund 
[IMF] or the World Bank), a private organization (e.g., the Standard & Poor’s Corporation), or 
an individual. Literally, there are thousands of such agencies collecting data for one purpose 
or another. 

The Internet 

The Internet has literally revolutionized data gathering. If you just “surf the net” with a 
keyword (e.g., exchange rates), you will be swamped with all kinds of data sources. In 
Appendix E we provide some of the frequently visited websites that provide economic and 
financial data of all sorts. Most of the data can be downloaded without much cost. You may 
want to bookmark the various websites that might provide you with useful economic data. 

The data collected by various agencies may be experimental or nonexperimental. 
In experimental data, often collected in the natural sciences, the investigator may want to 
collect data while holding certain factors constant in order to assess the impact of some 
factors on a given phenomenon. For instance, in assessing the impact of obesity on blood 
pressure, the researcher would want to collect data while holding constant the eating, 
smoking, and drinking habits of the people in order to minimize the influence of these 
variables on blood pressure. 

In the social sciences, the data that one generally encounters are nonexperimental in 
nature, that is, not subject to the control of the researcher. 14 For example, the data on GNP, 
unemployment, stock prices, etc., are not directly under the control of the investigator. As we 
shall see, this lack of control often creates special problems for the researcher in pinning 
down the exact cause or causes affecting a particular situation. For example, is it the money 
supply that determines the (nominal) GDP or is it the other way around? 


12 Y. Grunfeld, "The Determinants of Corporate Investment," unpublished PhD thesis, Department of 
Economics, University of Chicago, 1958. These data have become a workhorse for illustrating panel 
data regression models. 

13 For an illuminating account, see Albert T. Somers, The U.S. Economy Demystified: What the Major 
Economic Statistics Mean and their Significance for Business, D.C. Heath, Lexington, Mass., 1985. 

14 ln the social sciences too sometimes one can have a controlled experiment. An example is given in 
Exercise 1.6. 
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TABLE 1.2 Investment Data for Four Companies, 1935-1954 


Observation / F_! C_i Observation / F_! C_i 


GE 


US 


1935 33.1 

1936 45.0 

1937 77.2 

1938 44.6 

1939 48.1 

1940 74.4 

1941 113.0 

1942 91.9 

1943 61.3 

1944 56.8 

1945 93.6 

1946 159.9 

1947 147.2 

1948 146.3 

1949 98.3 

1950 93.5 

1951 135.2 

1952 157.3 

1953 179.5 

1954 189.6 


1170.6 97.8 

2015.8 104.4 

2803.3 118.0 

2039.7 156.2 

2256.2 172.6 

2132.2 186.6 

1834.1 220.9 

1588.0 287.8 

1749.4 319.9 

1687.2 321.3 

2007.7 319.6 

2208.3 346.0 

1656.7 456.4 

1604.4 543.4 

1431.8 618.3 

1610.5 647.4 

1819.4 671.3 

2079.7 726.1 

2371.6 800.3 

2759.9 888.9 


1935 209.9 

1936 355.3 

1937 469.9 

1938 262.3 

1939 230.4 

1940 361.6 

1941 472.8 

1942 445.6 

1943 361.6 

1944 288.2 

1945 258.7 

1946 420.3 

1947 420.5 

1948 494.5 

1949 405.1 

1950 418.8 

1951 588.2 

1952 645.2 

1953 641.0 

1954 459.3 


1362.4 53.8 

1807.1 50.5 

2673.3 118.1 

1801.9 260.2 

1957.3 312.7 

2202.9 254.2 

2380.5 261.4 

2168.6 298.7 

1985.1 301.8 

1813.9 279.1 

1850.2 213.8 

2067.7 232.6 

1796.7 264.8 

1625.8 306.9 

1667.0 351.1 

1677.4 357.8 

2289.5 341.1 

2159.4 444.2 

2031.3 623.6 

2115.5 669.7 


WEST 


1935 317.6 

1936 391.8 

1937 410.6 

1938 257.7 

1939 330.8 

1940 461.2 

1941 512.0 

1942 448.0 

1943 499.6 

1944 547.5 

1945 561.2 

1946 688.1 

1947 568.9 

1948 529.2 

1949 555.1 

1950 642.9 

1951 755.9 

1952 891.2 

1953 1304.4 

1954 1486.7 


3078.5 2.8 

4661.7 52.6 

5387.1 156.9 

2792.2 209.2 

4313.2 203.4 

4643.9 207.2 

4551.2 255.2 

3244.1 303.7 

4053.7 264.1 

4379.3 201.6 

4840.9 265.0 

4900.0 402.2 

3526.5 761.5 

3245.7 922.4 

3700.2 1020.1 

3755.6 1099.0 

4833.0 1207.7 

4924.9 1430.5 

6241.7 1777.3 

5593.6 2226.3 


1935 12.93 

1936 25.90 

1937 35.05 

1938 22.89 

1939 18.84 

1940 28.57 

1941 48.51 

1942 43.34 

1943 37.02 

1944 37.81 

1945 39.27 

1946 53.46 

1947 55.56 

1948 49.56 

1949 32.04 

1950 32.24 

1951 54.38 

1952 71.78 

1953 90.08 

1954 68.60 


191.5 1.8 

516.0 0.8 

729.0 7.4 

560.4 18.1 

519.9 23.5 

628.5 26.5 

537.1 36.2 

561.2 60.8 

617.2 84.4 

626.7 91.2 

737.2 92.4 

760.5 86.0 

581.4 111.1 

662.3 130.6 

583.8 141.8 

635.2 136.7 

732.8 129.7 

864.1 145.5 

1193.5 174.8 

1188.9 213.5 


Notes: Y= I = gross investment = additions to plant and equipment plus maintenance and repairs, in millions of dollars deflated by Pi. 

X 2 = F = value of the firm = price of common and preferred shares at Dec. 31 (or average price of Dec. 31 and Jan. 31 of the following year) times 
number of common and preferred shares outstanding plus total book value of debt at Dec. 31, in millions of dollars deflated by P 2 . 

X3 = C = stock of plant and equipment = accumulated sum of net additions to plant and equipment deflated by Pi minus depreciation allowance 
deflated by P 3 in these definitions. 

Pi = implicit price deflator of producers’ durable equipment (1947 = 100). 

P 2 = implicit price deflator of GNP (1947 = 100). 

P 3 = depreciation expense deflator = 10-year moving average of wholesale price index of metals and metal products (1947 = 100). 

Source: Reproduced from H. D. Vinod and Aman Ullah, Recent Advances in Regression Methods, Marcel Dekker, New York, 1981, pp. 259-261. 
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The Accuracy of Data 15 

Although plenty of data are available for economic research, the quality of the data is often 
not that good. There are several reasons for that. 

1. As noted, most social science data are nonexperimental in nature. Therefore, there is the 
possibility of observational errors, either of omission or commission. 

2. Even in experimentally collected data, errors of measurement arise from approxima¬ 
tions and roundoffs. 

3. In questionnaire-type surveys, the problem of nonresponse can be serious; a researcher 
is lucky to get a 40 percent response rate to a questionnaire. Analysis based on such a 
partial response rate may not truly reflect the behavior of the 60 percent who did not re¬ 
spond, thereby leading to what is known as (sample) selectivity bias. Then there is the 
further problem that those who do respond to the questionnaire may not answer all the 
questions, especially questions of a financially sensitive nature, thus leading to additional 
selectivity bias. 

4. The sampling methods used in obtaining the data may vary so widely that it is often dif¬ 
ficult to compare the results obtained from the various samples. 

5. Economic data are generally available at a highly aggregate level. For example, most 
macrodata (e.g., GNP, employment, inflation, unemployment) are available for the econ¬ 
omy as a whole or at the most for some broad geographical regions. Such highly aggre¬ 
gated data may not tell us much about the individuals or microunits that may be the 
ultimate object of study. 

6. Because of confidentiality, certain data can be published only in highly aggregate form. 
The IRS, for example, is not allowed by law to disclose data on individual tax returns; 
it can only release some broad summary data. Therefore, if one wants to find out how 
much individuals with a certain level of income spent on health care, one cannot do so 
except at a very highly aggregate level. Such macroanalysis often fails to reveal the dy¬ 
namics of the behavior of the microunits. Similarly, the Department of Commerce, 
which conducts the census of business every 5 years, is not allowed to disclose infor¬ 
mation on production, employment, energy consumption, research and development 
expenditure, etc., at the firm level. It is therefore difficult to study the interfirm differences 
on these items. 

Because of all of these and many other problems, the researcher should always keep 
in mind that the results of research are only as good as the quality of the data. There¬ 
fore, if in given situations researchers find that the results of the research are “unsatisfac¬ 
tory,” the cause may be not that they used the wrong model but that the quality of the data 
was poor. Unfortunately, because of the nonexperimental nature of the data used in most 
social science studies, researchers very often have no choice but to depend on the available 
data. But they should always keep in mind that the data used may not be the best and should 
try not to be too dogmatic about the results obtained from a given study, especially when 
the quality of the data is suspect. 

A Note on the Measurement Scales of Variables 16 

The variables that we will generally encounter fall into four broad categories: ratio scale, 
interval scale, ordinal scale, and nominal scale. It is important that we understand each. 


15 For a critical review, see O. Morgenstern, The Accuracy of Economic Observations, 2d ed., Princeton 
University Press, Princeton, N.J., 1963. 

16 The following discussion relies heavily on Aris Spanos, Probability Theory and Statistical Inference: 
Econometric Modeling with Observational Data, Cambridge University Press, New York, 1999, p. 24. 
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Ratio Scale 

For a variable X, taking two values, X\ and X 2 , the ratio X\/X 2 and the distance (X 2 - X\) 
are meaningful quantities. Also, there is a natural ordering (ascending or descending) of the 
values along the scale. Therefore, comparisons such as X 2 < X, or X 2 > X t are meaning¬ 
ful. Most economic variables belong to this category. Thus, it is meaningful to ask how big 
this year’s GDP is compared with the previous year’s GDP. Personal income, measured 
in dollars, is a ratio variable; someone earning $100,000 is making twice as much as an¬ 
other person earning $50,000 (before taxes are assessed, of course!). 

Interval Scale 

An interval scale variable satisfies the last two properties of the ratio scale variable but not 
the first. Thus, the distance between two time periods, say (2000-1995) is meaningful, but 
not the ratio of two time periods (2000/1995). At 11:00 a.m. PST on August 11, 2007, 
Portland, Oregon, reported a temperature of 60 degrees Fahrenheit while Tallahassee, 
Florida, reached 90 degrees. Temperature is not measured on a ratio scale since it does not 
make sense to claim that Tallahassee was 50 percent warmer than Portland. This is mainly 
due to the fact that the Fahrenheit scale does not use 0 degrees as a natural base. 

Ordinal Scale 

A variable belongs to this category only if it satisfies the third property of the ratio scale 
(i.e., natural ordering). Examples are grading systems (A, B, C grades) or income class 
(upper, middle, lower). For these variables the ordering exists but the distances between the 
categories cannot be quantified. Students of economics will recall the indifference curves 
between two goods. Each higher indifference curve indicates a higher level of utility, but 
one cannot quantify by how much one indifference curve is higher than the others. 

Nominal Scale 

Variables in this category have none of the features of the ratio scale variables. Variables 
such as gender (male, female) and marital status (married, unmarried, divorced, separated) 
simply denote categories. Question: What is the reason why such variables cannot be 
expressed on the ratio, interval, or ordinal scales? 

As we shall see, econometric techniques that may be suitable for ratio scale variables 
may not be suitable for nominal scale variables. Therefore, it is important to bear in mind 
the distinctions among the four types of measurement scales discussed above. 


Summary and 
Conclusions 


1. The key idea behind regression analysis is the statistical dependence of one variable, the 
dependent variable, on one or more other variables, the explanatory variables. 

2. The objective of such analysis is to estimate and/or predict the mean or average value of the 
dependent variable on the basis of the known or fixed values of the explanatory variables. 

3. In practice the success of regression analysis depends on the availability of the appro¬ 
priate data. This chapter discussed the nature, sources, and limitations of the data that 
are generally available for research, especially in the social sciences. 

4. In any research, the researcher should clearly state the sources of the data used in 
the analysis, their definitions, their methods of collection, and any gaps or omissions 
in the data as well as any revisions in the data. Keep in mind that the macroeconomic 
data published by the government are often revised. 

5. Since the reader may not have the time, energy, or resources to track down the data, the 
reader has the right to presume that the data used by the researcher have been properly 
gathered and that the computations and analysis are correct. 
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EXERCISES 11 Table 1.3 gives data on the Consumer Price Index (CPI) for seven industrialized 

countries with 1982-1984 = 100 as the base of the index. 

a. From the given data, compute the inflation rate for each country. 17 

b. Plot the inflation rate for each country against time (i.e., use the horizontal axis for 
time and the vertical axis for the inflation rate). 

c. What broad conclusions can you draw about the inflation experience in the seven 
countries? 

d. Which country’s inflation rate seems to be most variable? Can you offer any 
explanation? 

1.2. a. Using Table 1.3, plot the inflation rate of Canada, France, Germany, Italy, Japan, 
and the United Kingdom against the United States inflation rate. 

b. Comment generally about the behavior of the inflation rate in the six countries 
vis-a-vis the U.S. inflation rate. 

c. If you find that the six countries’ inflation rates move in the same direction as the 
U.S. inflation rate, would that suggest that U.S. inflation “causes” inflation in the 
other countries? Why or why not? 


TABLE 1.3 

CPI in Seven 

Year 

U.S. 

Canada 

Japan 

France 

Germany 

Italy 

U.K. 

Industrial Countries, 

1980 

82.4 

76.1 

91.0 

72.2 

86.7 

63.9 

78.5 

1980-2005 

1981 

90.9 

85.6 

95.3 

81.8 

92.2 

75.5 

87.9 

(1982-1984 = 100) 

1982 

96.5 

94.9 

98.1 

91.7 

97.0 

87.8 

95.4 

1983 

99.6 

100.4 

99.8 

100.3 

100.3 

100.8 

99.8 

Source: Economic Report of the 

1984 

103.9 

104.7 

102.1 

108.0 

102.7 

111.4 

104.8 

p. 354. 

1985 

107.6 

109.0 

104.2 

114.3 

104.8 

121.7 

111.1 

1986 

109.6 

113.5 

104.9 

117.2 

104.6 

128.9 

114.9 


1987 

113.6 

118.4 

104.9 

121.1 

104.9 

135.1 

119.7 


1988 

118.3 

123.2 

105.6 

124.3 

106.3 

141.9 

125.6 


1989 

124.0 

129.3 

108.0 

128.7 

109.2 

150.7 

135.4 


1990 

130.7 

135.5 

111.4 

132.9 

112.2 

160.4 

148.2 


1991 

136.2 

143.1 

115.0 

137.2 

116.3 

170.5 

156.9 


1992 

140.3 

145.3 

117.0 

140.4 

122.2 

179.5 

162.7 


1993 

144.5 

147.9 

118.5 

143.4 

127.6 

187.7 

165.3 


1994 

148.2 

148.2 

119.3 

145.8 

131.1 

195.3 

169.3 


1995 

152.4 

151.4 

119.2 

148.4 

133.3 

205.6 

175.2 


1996 

156.9 

153.8 

119.3 

151.4 

135.3 

213.8 

179.4 


1997 

160.5 

156.3 

121.5 

153.2 

137.8 

218.2 

185.1 


1998 

163.0 

157.8 

122.2 

154.2 

139.1 

222.5 

191.4 


1999 

166.6 

160.5 

121.8 

155.0 

140.0 

226.2 

194.3 


2000 

172.2 

164.9 

121.0 

157.6 

142.0 

231.9 

200.1 


2001 

177.1 

169.1 

120.1 

160.2 

144.8 

238.3 

203.6 


2002 

179.9 

172.9 

119.0 

163.3 

146.7 

244.3 

207.0 


2003 

184.0 

177.7 

118.7 

166.7 

148.3 

250.8 

213.0 


2004 

188.9 

181.0 

118.7 

170.3 

150.8 

256.3 

219.4 


2005 

195.3 

184.9 

118.3 

173.2 

153.7 

261.3 

225.6 


17 Subtract from the current year's CPI the CPI from the previous year, divide the difference by the 
previous year's CPI, and multiply the result by 100. Thus, the inflation rate for Canada for 1981 is 
[(85.6 — 76.1)/76.1] x 100 = 12.48% (approx.). 
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1.3. Table 1.4 gives the foreign exchange rates for nine industrialized countries for the 
years 1985-2006. Except for the United Kingdom, the exchange rate is defined as 
the units of foreign currency for one U.S. dollar; for the United Kingdom, it is defined 
as the number of U.S. dollars for one U.K. pound. 

a. Plot these exchange rates against time and comment on the general behavior of the 
exchange rates over the given time period. 

b. The dollar is said to appreciate if it can buy more units of a foreign currency. 
Contrarily, it is said to depreciate if it buys fewer units of a foreign currency. Over 
the time period 1985-2006, what has been the general behavior of the U.S. dollar? 
Incidentally, look up any textbook on macroeconomics or international economics 
to find out what factors determine the appreciation or depreciation of a currency. 

1.4. The data behind the Ml money supply in Figure 1.5 are given in Table 1.5. Can you 
give reasons why the money supply has been increasing over the time period shown in 
the table? 

1.5. Suppose you were to develop an economic model of criminal activities, say, the hours 
spent in criminal activities (e.g., selling illegal drugs). What variables would you con¬ 
sider in developing such a model? See if your model matches the one developed by the 
Nobel laureate economist Gary Becker. 18 


TABLE 1.4 Exchange Rates for Nine Countries: 1985-2006 


Year 

Australia 

Canada 

China P. R. 

japan 

Mexico 

South 

Korea 

Sweden 

Switzerland 

United 

Kingdom 

1985 

0.7003 

1.3659 

2.9434 

238.47 

0.257 

872.45 

8.6032 

2.4552 

1.2974 

1986 

0.6709 

1.3896 

3.4616 

168.35 

0.612 

884.60 

7.1273 

1.7979 

1.4677 

1987 

0.7014 

1.3259 

3.7314 

144.60 

1.378 

826.16 

6.3469 

1.4918 

1.6398 

1988 

0.7841 

1.2306 

3.7314 

128.17 

2.273 

734.52 

6.1370 

1.4643 

1.7813 

1989 

0.7919 

1.1842 

3.7673 

138.07 

2.461 

674.13 

6.4559 

1.6369 

1.6382 

1990 

0.7807 

1.1668 

4.7921 

145.00 

2.813 

710.64 

5.9231 

1.3901 

1.7841 

1991 

0.7787 

1.1460 

5.3337 

1 34.59 

3.018 

736.73 

6.0521 

1.4356 

1.7674 

1992 

0.7352 

1.2085 

5.5206 

126.78 

3.095 

784.66 

5.8258 

1.4064 

1.7663 

1993 

0.6799 

1.2902 

5.7795 

111.08 

3.116 

805.75 

7.7956 

1.4781 

1.5016 

1994 

0.7316 

1.3664 

8.6397 

102.18 

3.385 

806.93 

7.7161 

1.3667 

1.5319 

1995 

0.7407 

1.3725 

8.3700 

93.96 

6.447 

772.69 

7.1406 

1.1812 

1.5785 

1996 

0.7828 

1.3638 

8.3389 

108.78 

7.600 

805.00 

6.7082 

1.2361 

1.5607 

1997 

0.7437 

1.3849 

8.3193 

121.06 

7.918 

953.19 

7.6446 

1.4514 

1.6376 

1998 

0.6291 

1.4836 

8.3008 

1 30.99 

9.152 

1,400.40 

7.9522 

1.4506 

1.6573 

1999 

0.6454 

1.4858 

8.2783 

113.73 

9.553 

1,189.84 

8.2740 

1.5045 

1.61 72 

2000 

0.5815 

1.4855 

8.2784 

107.80 

9.459 

1,130.90 

9.1735 

1.6904 

1.5156 

2001 

0.5169 

1.5487 

8.2770 

121.57 

9.337 

1,292.02 

10.3425 

1.6891 

1.4396 

2002 

0.5437 

1.5704 

8.2771 

125.22 

9.663 

1,250.31 

9.7233 

1.5567 

1.5025 

2003 

0.6524 

1.4008 

8.2772 

115.94 

10.793 

1,192.08 

8.0787 

1.3450 

1.6347 

2004 

0.7365 

1.3017 

8.2768 

108.15 

11.290 

1,145.24 

7.3480 

1.2428 

1.8330 

2005 

0.7627 

1.2115 

8.1936 

110.11 

10.894 

1,023.75 

7.4710 

1.2459 

1.8204 

2006 

0.7535 

1.1340 

7.9723 

116.31 

10.906 

954.32 

7.3718 

1.2532 

1.8434 

Source: Ec 

onomic Report of th 

e President, 2C 

107, Table B-110, p. 3 








18 G. S. Becker, "Crime and Punishment: An Economic Approach," journal of Political Economy, vol. 76, 
1968, pp. 169-217. 
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TABLE 1.5 
Seasonally Adj usted 
Ml Supply: 
1959:01-1999:07 
(billions of dollars) 

Federal Reserve Bank, USA. 


1959:01 

1959:07 

1960:01 

1960:07 

1961:01 

1961:07 

1962:01 

1962:07 

1963:01 

1963:07 

1964:01 

1964:07 

1965:01 

1965:07 

1966:01 

1966:07 

1967:01 

1967:07 

1968:01 

1968:07 

1969:01 

1969:07 

1970:01 

1970:07 

1971:01 

1971:07 

1972:01 

1972:07 

1973:01 

1973:07 

1974:01 

1974:07 

1975:01 

1975:07 

1976:01 

1976:07 

1977:01 

1977:07 

1978:01 

1978:07 

1979:01 

1979:07 

1980:01 

1980:07 

1981:01 

1981:07 

1982:01 

1982:07 

1983:01 

1983:07 

1984:01 

1984:07 


1 38.8900 

141.7000 
1 39.9800 

140.1800 
141.0600 

142.9200 
145.2400 

146.4600 

148.2600 
151.3400 

153.7400 

156.8000 

160.7100 
163.0500 
169.0800 

170.3100 

171.8600 

178.1300 

184.3300 

190.4900 

198.6900 

201.6600 
206.2200 

207.9800 

215.5400 

224.8500 
230.0900 
238.7900 

251.4700 

257.5400 

263.7600 

269.2700 

273.9000 

283.6800 

288.4200 

297.2000 

308.2600 

320.1900 

334.4000 
347.6300 

358.6000 
377.2100 

385.8500 
394.9100 
410.8300 

427.9000 

442.1300 
449.0900 

476.6800 

508.9600 

524.4000 

542.1300 


139.3900 

141.9000 

139.8700 

141.3100 

141.6000 

143.4900 

145.6600 

146.5700 

148.9000 

151.7800 

154.3100 

157.8200 
160.9400 

163.6800 

169.6200 

170.8100 

172.9900 

179.7100 

184.7100 
191.8400 

199.3500 
201.7300 
205.0000 

209.9300 
21 7.4200 

225.5800 

232.3200 

240.9300 

252.1500 

257.7600 

265.3100 
270.1200 
275.0000 

284.1500 

290.7600 
299.0500 

311.5400 

322.2700 

335.3000 

349.6600 
359.9100 

378.8200 

389.7000 
400.0600 

414.3800 

427.8500 

441.4900 

452.4900 

483.8500 

511.6000 

526.9900 

542.3900 


139.7400 
141.0100 

139.7500 

141.1800 

141.8700 

143.7800 

145.9600 

146.3000 

149.1700 

151.9800 

154.4800 

158.7500 

161.4700 

164.8500 
170.5100 
171.9700 

174.8100 

180.6800 

185.4700 

192.7400 
200.0200 
202.1000 

205.7500 

211.8000 

218.7700 

226.4700 

234.3000 

243.1800 

251.6700 

257.8600 

266.6800 
271.0500 

276.4200 

285.6900 

292.7000 

299.6700 
31 3.9400 

324.4800 

336.9600 

352.2600 

362.4500 
379.2800 

388.1300 
405.3600 

418.6900 

427.4600 
442.3700 
457.5000 

490.1800 

513.4100 

530.7800 

543.8600 


1 39.6900 

140.4700 
139.5600 

140.9200 

142.1300 
144.1400 

146.4000 

146.7100 

149.7000 
152.5500 

154.7700 
159.2400 
162.0300 
165.9700 

171.8100 

171.1600 

174.1700 
181.6400 

186.6000 
194.0200 

200.7100 

202.9000 
206.7200 

212.8800 
220.0000 

227.1600 

235.5800 
245.0200 

252.7400 
259.0400 

267.2000 

272.3500 

276.1700 

285.3900 

294.6600 
302.0400 
316.0200 

326.4000 

339.9200 

353.3500 
368.0500 

380.8700 
383.4400 
409.0600 
427.0600 

428.4500 

446.7800 

464.5700 

492.7700 
517.2100 
534.0300 

543.8700 


140.6800 

140.3800 
139.6100 

140.8600 

142.6600 

144.7600 
146.8400 
147.2900 

150.3900 
153.6500 

155.3300 

159.9600 

161.7000 

166.7100 

171.3300 

171.3800 

175.6800 

182.3800 

187.9900 
196.0200 

200.8100 

203.5700 
207.2200 

213.6600 
222.0200 

227.7600 

235.8900 

246.4100 

254.8900 

260.9800 
267.5600 

273.7100 

279.2000 
286.8300 

295.9300 

303.5900 

317.1900 
328.6400 

344.8600 

355.4100 

369.5900 

380.8100 

384.6000 
410.3700 

424.4300 

430.8800 

446.5300 
471.1200 

499.7800 

518.5300 

536.5900 

547.3200 


141.1700 
139.9500 

139.5800 

140.6900 

142.8800 

145.2000 

146.5800 

147.8200 

150.4300 
153.2900 

155.6200 

160.3000 

162.1900 

167.8500 

171.5700 
172.0300 
177.0200 

183.2600 

189.4200 

197.4100 

201.2700 

203.8800 

207.5400 

214.4100 

223.4500 

228.3200 

236.6200 

249.2500 

256.6900 

262.8800 
268.4400 

274.2000 

282.4300 
287.0700 

296.1600 

306.2500 

318.7100 

330.8700 

346.8000 
357.2800 
373.3400 

381.7700 

389.4600 
408.0600 
425.5000 

436.1700 

447.8900 

474.3000 

504.3500 
520.7900 

540.5400 

551.1900 


(Continued) 
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TABLE 1.5 

(Continued) 


1985:01 

1985:07 

1986:01 

1986:07 

1987:01 

1987:07 

1988:01 

1988:07 

1989:01 

1989:07 

1990:01 

1990:07 

1991:01 

1991:07 

1992:01 

1992:07 

1993:01 

1993:07 

1994:01 

1994:07 

1995:01 

1995:07 

1996:01 

1996:07 

1997:01 

1997:07 

1998:01 

1998:07 

1999:01 

1999:07 


555.6600 

590.8200 

620.4000 
672.2000 
729.3400 

744.9600 
755.5500 

783.4000 
784.9200 

779.7100 
794.9300 
811.8000 
826.7300 

862.9500 
910.4900 
964.6000 

1030.900 
1085.880 
11 32.200 
1151.490 
1150.640 
1146.500 
1122.580 

1112.340 

1080.520 
1067.570 
1073.810 
1075.370 
1091.000 

1099.530 


562.4800 

598.0600 

624.1400 
680.7700 
729.8400 

746.9600 
757.0700 
785.0800 

783.4000 

781.1400 

797.6500 
81 7.8500 

832.4000 

868.6500 
925.1 300 

975.7100 

1033.150 

1095.560 

1136.130 

1151.390 

1146.740 

1146.100 

1117.530 

1102.180 

1076.200 

1072.080 

1076.020 

1072.210 

1092.650 

1102.400 


565.7400 
604.4700 
632.8100 
688.5100 
733.0100 
748.6600 
761.1800 

784.8200 

782.7400 
782.2000 
801.2500 

821.8300 
838.6200 
871.5600 
936.0000 
988.8400 

1037.990 

1105.430 

1139.910 

1152.440 

1146.520 

1142.270 

1122.590 

1095.610 

1072.420 

1064.820 

1080.650 

1074.650 
1102.010 
1093.460 


569.5500 
607.9100 
640.3500 
695.2600 
743.3900 
756.5000 
767.5700 
783.6300 

778.8200 
787.0500 
806.2400 
820.3000 
842.7300 

878.4000 
943.8900 

1004.340 
1047.470 
1113.800 
1141.420 
1150.410 
1149.480 
1136.430 

1124.520 
1082.560 
1067.450 
1062.060 
1082.090 

1080.400 

1108.400 


575.0700 

611.8300 
652.0100 
705.2400 
746.0000 

752.8300 
771.6800 
784.4600 
774.7900 

787.9500 
804.3600 
822.0600 

848.9600 

887.9500 
950.7800 

1016.040 

1066.220 

1123.900 

1142.850 

1150.440 

1144.650 

1133.550 

1116.300 

1080.490 

1063.370 

1067.530 

1078.170 

1088.960 

1104.750 


583.1700 

619.3600 

661.5200 

724.2800 

743.7200 

749.6800 

779.1000 

786.2600 

774.2200 

792.5700 

810.3300 
824.5600 

858.3300 
896.7000 

954.7100 

1024.450 

1075.610 

1129.310 

1145.650 

1149.750 

1144.240 

1126.730 

1115.470 

1081.340 

1065.990 

1074.870 

1077.780 

1093.350 

1101.110 


1.6. Controlled experiments in economics: On April 7,2000, President Clinton signed into 
law a bill passed by both Houses of the U.S. Congress that lifted earnings limitations 
on Social Security recipients. Until then, recipients between the ages of 65 and 69 who 
earned more than $17,000 a year would lose $1 worth of Social Security benefit for 
every $3 of income earned in excess of $17,000. How would you devise a study to 
assess the impact of this change in the law? Note: There was no income limitation for 
recipients over the age of 70 under the old law. 

1.7. The data presented in Table 1.6 were published in the March 1,1984, issue of 77ze Wall 
Street Journal. They relate to the advertising budget (in millions of dollars) of 21 firms 
for 1983 and millions of impressions retained per week by the viewers of the products 
of these firms. The data are based on a survey of 4000 adults in which users of the 
products were asked to cite a commercial they had seen for the product category in the 
past week. 

a. Plot impressions on the vertical axis and advertising expenditure on the horizontal 
axis. 

b. What can you say about the nature of the relationship between the two variables? 

c. Looking at your graph, do you think it pays to advertise? Think about all those 
commercials shown on Super Bowl Sunday or during the World Series. 

Note: We will explore further the data given in Table 1.6 in subsequent chapters. 
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TABLE 1.6 

Impact of Advertising 

Expenditure 

Source: http://lib.stat.cmu.edu/ 
DASL/Dataflles/tvadsdat.html. 


Firm 

Impressions, 

millions 

Expenditure, 
millions of 1983 dollars 

1. Miller Lite 

32.1 

50.1 

2. Pepsi 

99.6 

74.1 

3. Stroh's 

11.7 

19.3 

4. Fed'l Express 

21.9 

22.9 

5. Burger King 

60.8 

82.4 

6. Coca-Cola 

78.6 

40.1 

7. McDonald's 

92.4 

185.9 

8. MCI 

50.7 

26.9 

9. Diet Cola 

21.4 

20.4 

10. Ford 

40.1 

166.2 

11. Levi's 

40.8 

27.0 

12. Bud Lite 

10.4 

45.6 

13. ATT/Bell 

88.9 

154.9 

14. Calvin Klein 

12.0 

5.0 

15. Wendy's 

29.2 

49.7 

16. Polaroid 

38.0 

26.9 

17. Shasta 

10.0 

5.7 

18. Meow Mix 

12.3 

7.6 

19. Oscar Meyer 

23.4 

9.2 

20. Crest 

71.1 

32.4 

21. Kibbles 'N Bits 

4.4 

6.1 






Chapter 


Two-Variable 
Regression Analysis: 
Some Basic Ideas 


In Chapter 1 we discussed the concept of regression in broad terms. In this chapter we 
approach the subject somewhat formally. Specifically, this and the following three chapters 
introduce the reader to the theory underlying the simplest possible regression analysis, 
namely, the bivariate, or two-variable, regression in which the dependent variable (the 
regressand) is related to a single explanatory variable (the regressor). This case is consid¬ 
ered first, not because of its practical adequacy, but because it presents the fundamental 
ideas of regression analysis as simply as possible and some of these ideas can be illustrated 
with the aid of two-dimensional graphs. Moreover, as we shall see, the more general 
multiple regression analysis in which the regressand is related to one or more regressors is 
in many ways a logical extension of the two-variable case. 


2.1 A Hypothetical Example 1 

As noted in Section 1.2, regression analysis is largely concerned with estimating and/or 
predicting the (population) mean value of the dependent variable on the basis of the 
known or fixed values of the explanatory variable(s). 2 To understand this, consider the data 
given in Table 2.1. The data in the table refer to a total population of 60 families in a 
hypothetical community and their weekly income (X) and weekly consumption expenditure 
(7), both in dollars. The 60 families are divided into 10 income groups (from $80 to $260) 
and the weekly expenditures of each family in the various groups are as shown in the table. 
Therefore, we have 10 fixed values of X and the corresponding Y values against each of the 
X values; so to speak, there are 10 Y subpopulations. 

There is considerable variation in weekly consumption expenditure in each income 
group, which can be seen clearly from Figure 2.1. But the general picture that one gets is 


'The reader whose statistical knowledge has become somewhat rusty may want to freshen it up by 
reading the statistical appendix, Appendix A, before reading this chapter. 

2 The expected value, or expectation, or population mean of a random variable 7is denoted by the 
symbol E(Y). On the other hand, the mean value computed from a sample of values from the Y 
population is denoted as Y, read as Y bar. 


34 
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TABLE 2.1 

Weekly Family 
Income X, $ 



80 

100 

120 

140 

160 

180 

200 

220 

240 

260 

Weekly family 

55 

65 

79 

80 

102 

110 

120 

135 

137 

150 

consumption 

60 

70 

84 

93 

107 

115 

136 

137 

145 

152 

expenditure Y, $ 

65 

74 

90 

95 

110 

120 

140 

140 

155 

175 


70 

80 

94 

103 

116 

130 

144 

152 

165 

178 


75 

85 

98 

108 

118 

135 

145 

157 

175 

180 


- 

88 

- 

113 

125 

140 

- 

160 

189 

185 


- 

- 

- 

115 

- 

- 

- 

162 

- 

191 

Total 

325 

462 

445 

707 

678 

750 

685 

1043 

966 

1211 

Conditional 
means of Y, 
E{Y\X) 

65 

77 

89 

101 

113 

125 

137 

149 

161 

173 


that, despite the variability of weekly consumption expenditure within each income 
bracket, on the average, weekly consumption expenditure increases as income increases. 
To see this clearly, in Table 2.1 we have given the mean, or average, weekly consumption 
expenditure corresponding to each of the 10 levels of income. Thus, corresponding to the 
weekly income level of $80, the mean consumption expenditure is $65, while correspond¬ 
ing to the income level of $200, it is $137. In all we have 10 mean values for the 10 sub¬ 
populations of Y. We call these mean values conditional expected values, as they depend 
on the given values of the (conditioning) variable X. Symbolically, we denote them as 
E(Y | X), which is read as the expected value of Y given the value ofW(see also Table 2.2). 

It is important to distinguish these conditional expected values from the unconditional 
expected value of weekly consumption expenditure, E(Y). If we add the weekly consump¬ 
tion expenditures for all the 60 families in the population and divide this number by 60, we 
get the number $121.20 ($7272/60), which is the unconditional mean, or expected, value 
of weekly consumption expenditure, E(Y); it is unconditional in the sense that in arriving 
at this number we have disregarded the income levels of the various families. 3 Obviously, 


FIGURE 2.1 

Conditional 
distribution of 
expenditure for various 
levels of income 
(data of Table 2.1). 



3 As shown in Appendix A, in general the conditional and unconditional mean values are different. 
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TABLE 2.2 

Probabilities p(Y\Xi) 
for the Data of 
Table 2.1 



the various conditional expected values of 7 given in Table 2.1 are different from the 
unconditional expected value of Y of $121.20. When we ask the question, “What is the 
expected value of weekly consumption expenditure of a family?” we get the answer $ 121.20 
(the unconditional mean). But if we ask the question, “What is the expected value 
of weekly consumption expenditure of a family whose monthly income is, say, $140?” we 
get the answer $101 (the conditional mean). To put it differently, if we ask the question, 
“What is the best (mean) prediction of weekly expenditure of families with a weekly 
income of $ 140?” the answer would be $ 101. Thus the knowledge of the income level may 
enable us to better predict the mean value of consumption expenditure than if we do not 
have that knowledge. 4 This probably is the essence of regression analysis, as we shall 
discover throughout this text. 

The dark circled points in Figure 2.1 show the conditional mean values of Y against the 
various X values. If we join these conditional mean values, we obtain what is known as the 
population regression line (PRL), or more generally, the population regression curve. 5 
More simply, it is the regression of Y on X. The adjective “population” comes from the fact 
that we are dealing in this example with the entire population of 60 families. Of course, in 
reality a population may have many families. 

Geometrically, then, a population regression curve is simply the locus of the conditional 
means of the dependent variable for the fixed values of the explanatory variable (s). More 
simply, it is the curve connecting the means of the subpopulations of Y corresponding to the 
given values of the regressor X. It can be depicted as in Figure 2.2. 

This figure shows that for each X (i.e., income level) there is a population of Y values 
(weekly consumption expenditures) that are spread around the (conditional) mean of those 
7 values. For simplicity, we are assuming that these 7 values are distributed symmetrically 
around their respective (conditional) mean values. And the regression line (or curve) passes 
through these (conditional) mean values. 

With this background, the reader may find it instructive to reread the definition of 
regression given in Section 1.2. 


4 I am indebted to James Davidson on this perspective. See James Davidson, Econometric Theory, 
Blackwell Publishers, Oxford, U.K., 2000, p. 11. 

5 ln the present example the PRL is a straight line, but it could be a curve (see Figure 2.3). 
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FIGURE 2.2 

Population regression 
line (data of Table 2.1). 


Y 



2.2 The Concept of Population Regression Function (PRF) 

From the preceding discussion and Figures 2.1 and 2.2, it is clear that each conditional 
mean E(Y \ X,) is a function of X„ where X t is a given value ofX Symbolically, 

E(Y\X i ) = f(X i ) (2.2.1) 

where f{X t ) denotes some function of the explanatory variable X. In our example, 
E( Y | Xj) is a linear function of X t . Equation 2.2.1 is known as the conditional expectation 
function (CEF) or population regression function (PRF) or population regression (PR) 
for short. It states merely that the expected value of the distribution of Y given X t is 
functionally related to X t . In simple terms, it tells how the mean or average response of Y 
varies with X. 

What form does the function f(Xf) assume? This is an important question because in 
real situations we do not have the entire population available for examination. The func¬ 
tional form of the PRF is therefore an empirical question, although in specific cases theory 
may have something to say. For example, an economist might posit that consumption 
expenditure is linearly related to income. Therefore, as a first approximation or a working 
hypothesis, we may assume that the PRF E(Y \ X,) is a linear function of X„ say, of the type 

E(Y | Xt) = A + PiXi (2.2.2) 

where P\ and @2 are unknown but fixed parameters known as the regression coefficients; p\ 
and P 2 are also known as intercept and slope coefficients, respectively. Equation 2.2.1 itself 
is known as the linear population regression function. Some alternative expressions 
used in the literature are linear population regression model or simply linear population 
regression. In the sequel, the terms regression, regression equation, and regression model 
will be used synonymously. 
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In regression analysis our interest is in estimating the PRFs like Equation 2.2.2, that is, 
estimating the values of the unknowns fa and fa on the basis of observations on Y and X. 
This topic will be studied in detail in Chapter 3. 


2.3 The Meaning of the Term Linear 

Since this text is concerned primarily with linear models like Eq. (2.2.2), it is essential to 
know what the term linear really means, for it can be interpreted in two different ways. 


Linearity in the Variables 

The first and perhaps more “natural” meaning of linearity is that the conditional expecta¬ 
tion of Y is a linear function of X„ such as, for example, Eq. (2.2.2). 6 Geometrically, the 
regression curve in this case is a straight line. In this interpretation, a regression function 
such as E(Y \ X,) = fa + faX 7 is not a linear function because the variable X appears with 
a power or index of 2. 


Linearity in the Parameters 

The second interpretation of linearity is that the conditional expectation of Y, E(Y \X t ), 
is a linear function of the parameters, the jS’s; it may or may not be linear in the variable 
X. 7 In this interpretation E( Y \ X, ) = fa + faXf is a linear (in the parameter) re¬ 
gression model. To see this, let us suppose X takes the value 3. Therefore, 
E(Y \ X = 3) = fa + 9 fa, which is obviously linear in fa and fa. All the models shown in 
Figure 2.3 are thus linear regression models, that is, models linear in the parameters. 

Now consider the model E(Y \X t ) = fa + /3%Xj . Now suppose X — 3; then we obtain 
E(Y | Xj) = fa + which is nonlinear in the parameter fa. The preceding model is 
an example of a nonlinear (in the parameter) regression model. We will discuss such 
models in Chapter 14. 

Of the two interpretations of linearity, linearity in the parameters is relevant for the 
development of the regression theory to be presented shortly. Therefore, from now on, the 
term “linear” regression will always mean a regression that is linear in the parameters; 
the s (that is, the parameters) are raised to the first power only. It may or may not be linear 
in the explanatory variables, theX’s. Schematically, we have Table 2.3. Thus, E(Y \ X() = 
fa + faXj, which is linear both in the parameters and variable, is a LRM, and so is 
E(Y | X() = fa + faX 7 , which is linear in the parameters but nonlinear in variable X. 


6 A function Y = f(X) is said to be linear in X if X appears with a power or index of 1 only (that is, 
terms such as X 2 , fax, and so on, are excluded) and is not multiplied or divided by any other variable 
(for example, X ■ Z or X/Z, where Z is another variable). If Y depends on X alone, another way to 
state that Y is linearly related to X is that the rate of change of Y with respect to X (i.e., the slope, or 
derivative, of /with respect to X, dY/dX ) is independent of the value of X. Thus, if Y = 4X, dY/dX = 4, 
which is independent of the value of X. But if Y = 4X 2 , dY/dX = 8X, which is not independent of 
the value taken by X. Hence this function is not linear in X. 

7 A function is said to be linear in the parameter, say, p-\, if Pi appears with a power of 1 only and is 
not multiplied or divided by any other parameter (for example, p-\p2, Pi/Pi, and so on). 



FIGURE 2.3 

Linear-in-parameter 

functions. 
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TABLE 2.3 

Linear Regression 
Models 



2.4 Stochastic Specification of PRF 

It is clear from Figure 2.1 that, as family income increases, family consumption expenditure 
on the average increases, too. But what about the consumption expenditure of an individual 
family in relation to its (fixed) level of income? It is obvious from Table 2.1 and Figure 2.1 
that an individual family’s consumption expenditure does not necessarily increase as the 
income level increases. For example, from Table 2.1 we observe that corresponding to the 
income level of $100 there is one family whose consumption expenditure of $65 is less than 
the consumption expenditures of two families whose weekly income is only $80. But notice 
that the average consumption expenditure of families with a weekly income of $100 is 
greater than the average consumption expenditure of families with a weekly income of 
$80 ($77 versus $65). 

What, then, can we say about the relationship between an individual family’s consump¬ 
tion expenditure and a given level of income? We see from Figure 2.1 that, given the 
income level of X,, an individual family’s consumption expenditure is clustered around the 
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average consumption of all families at that X t , that is, around its conditional expectation. 
Therefore, we can express the deviation of an individual Y, around its expected value as 
follows: 


ut = Y t - E(Y | Xi) 
or 

Yi=E(T\Xi)+u, (2.4.1) 

where the deviation «, is an unobservable random variable taking positive or negative 
values. Technically, u, is known as the stochastic disturbance or stochastic error term. 

How do we interpret Equation 2.4.1? We can say that the expenditure of an individual 
family, given its income level, can be expressed as the sum of two components: 
(1) E(Y\ Xi), which is simply the mean consumption expenditure of all the families with 
the same level of income. This component is known as the systematic, or deterministic, 
component, and (2) u,, which is the random, or nonsystematic, component. We shall 
examine shortly the nature of the stochastic disturbance term, but for the moment assume 
that it is a surrogate or proxy for all the omitted or neglected variables that may affect Fbut 
are not (or cannot be) included in the regression model. 

If E(Y | Xi) is assumed to be linear in X iy as in Eq. (2.2.2), Eq. (2.4.1) may be written as 

It = E(Y | Xt) + Ui 

= pi+p 2 X i + u i (2.4.2) 

Equation 2.4.2 posits that the consumption expenditure of a family is linearly related to its 
income plus the disturbance term. Thus, the individual consumption expenditures, given 
X = $80 (see Table 2.1), can be expressed as 

Y x = 55 = Pi + ft(80) + Hi 
Y 2 = 60 = Pi + ft(80) + u 2 

Y 3 = 65 = A + ft(80) + u 3 (2.4.3) 

Y 4 = 70 = A + 02(80) + u 4 
Y 5 = 75 = A + 02(80) + u 5 

Now if we take the expected value of Eq. (2.4.1) on both sides, we obtain 
E(Jt | Xt) = E[E(Y | Xi)] + Eiut | Xt) 

= E{Y ] Xi) + E(u t | Xi) (2.4.4) 

where use is made of the fact that the expected value of a constant is that constant itself. 8 
Notice carefully that in Equation 2.4.4 we have taken the conditional expectation, condi¬ 
tional upon the given W’s. 

Since E{Y l \ X t ) is the same thing as E( Y \ X,), Eq. (2.4.4) implies that 


E(Ui | Xi) = 0 


(2.4.5) 


8 See Appendix A for a brief discussion of the properties of the expectation operator E. Note that 
E(Y\Xft, once the value of X,• is fixed, is a constant. 
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Thus, the assumption that the regression line passes through the conditional means of Y 
(see Figure 2.2) implies that the conditional mean values of u, (conditional upon the given 
X’s) are zero. 

From the previous discussion, it is clear Eq. (2.2.2) and Eq. (2.4.2) are equivalent forms 
if E(u, | X,) — 0. * 1 2 3 4 5 * * * 9 But the stochastic specification in Eq. (2.4.2) has the advantage that it 
clearly shows that there are other variables besides income that affect consumption expen¬ 
diture and that an individual family’s consumption expenditure cannot be fully explained 
only by the variable(s) included in the regression model. 


2.5 The Significance of the Stochastic Disturbance Term 

As noted in Section 2.4, the disturbance term u, is a surrogate for all those variables that 
are omitted from the model but that collectively affect Y. The obvious question is: Why not 
introduce these variables into the model explicitly? Stated otherwise, why not develop a 
multiple regression model with as many variables as possible? The reasons are many. 

1. Vagueness of theory: The theory, if any, determining the behavior of Y may be, and 
often is, incomplete. We might know for certain that weekly income X influences weekly 
consumption expenditure Y, but we might be ignorant or unsure about the other variables 
affecting Y. Therefore, w, may be used as a substitute for all the excluded or omitted vari¬ 
ables from the model. 

2. Unavailability of data: Even if we know what some of the excluded variables are and 
therefore consider a multiple regression rather than a simple regression, we may not have 
quantitative information about these variables. It is a common experience in empirical 
analysis that the data we would ideally like to have often are not available. For example, in 
principle we could introduce family wealth as an explanatory variable in addition to the in¬ 
come variable to explain family consumption expenditure. But unfortunately, information 
on family wealth generally is not available. Therefore, we may be forced to omit the wealth 
variable from our model despite its great theoretical relevance in explaining consumption 
expenditure. 

3. Core variables versus peripheral variables: Assume in our consumption-income ex¬ 
ample that besides income X\, the number of children per family Xj, sex Xj, religion X4, 
education A5, and geographical region A), also affect consumption expenditure. But it is quite 
possible that the joint influence of all or some of these variables may be so small and at best 
nonsystematic or random that as a practical matter and for cost considerations it does not pay 
to introduce them into the model explicitly. One hopes that their combined effect can be 
treated as a random variable m, . 10 

4. Intrinsic randomness in human behavior: Even if we succeed in introducing all the 
relevant variables into the model, there is bound to be some “intrinsic” randomness in in¬ 
dividual F’s that cannot be explained no matter how hard we try. The disturbances, the u’ s, 
may very well reflect this intrinsic randomness. 

5. Poor proxy variables: Although the classical regression model (to be developed in 

Chapter 3) assumes that the variables Y and Ware measured accurately, in practice the data 


9 As a matter of fact, in the method of least squares to be developed in Chapter 3, it is assumed 

explicitly that £(u,|Xj) = 0. See Sec. 3.2. 

10 A further difficulty is that variables such as sex, education, and religion are difficult to quantify. 
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may be plagued by errors of measurement. Consider, for example, Milton Friedman’s well- 
known theory of the consumption function. 11 He regards permanent consumption ( Y p ) as 
a function of permanent income (X p ). But since data on these variables are not directly ob¬ 
servable, in practice we use proxy variables, such as current consumption (7) and current 
income (X), which can be observable. Since the observed Y and X may not equal Y p and 
X p , there is the problem of errors of measurement. The disturbance term u may in this case 
then also represent the errors of measurement. As we will see in a later chapter, if there are 
such errors of measurement, they can have serious implications for estimating the regres¬ 
sion coefficients, the /0’s. 

6. Principle of parsimony: Following Occam’s razor, 12 we would like to keep our re¬ 
gression model as simple as possible. If we can explain the behavior of Y “substantially” 
with two or three explanatory variables and if our theory is not strong enough to suggest 
what other variables might be included, why introduce more variables? Let u, represent all 
other variables. Of course, we should not exclude relevant and important variables just to 
keep the regression model simple. 

7. Wrong functional form: Even if we have theoretically correct variables explaining a 
phenomenon and even if we can obtain data on these variables, very often we do not know 
the form of the functional relationship between the regressand and the regressors. Is con¬ 
sumption expenditure a linear (invariable) function of income or a nonlinear (invariable) 
function? If it is the former, Y t = fl\ + foXj + u, is the proper functional relationship 
between Y and X, but if it is the latter, 7 = /h + /7X, + foXf + Ui may be the correct 
functional form. In two-variable models the functional form of the relationship can often 
be judged from the scattergram. But in a multiple regression model, it is not easy to deter¬ 
mine the appropriate functional form, for graphically we cannot visualize scattergrams in 
multiple dimensions. 

For all these reasons, the stochastic disturbances w, assume an extremely critical role in 
regression analysis, which we will see as we progress. 


2.6 The Sample Regression Function (SRF) 

By confining our discussion so far to the population of Y values corresponding to the fixed 
X’s, we have deliberately avoided sampling considerations (note that the data of Table 2.1 
represent the population, not a sample). But it is about time to face up to the sampling prob¬ 
lems, for in most practical situations what we have is but a sample of Y values correspond¬ 
ing to some fixed X’s. Therefore, our task now is to estimate the PRF on the basis of the 
sample information. 

As an illustration, pretend that the population of Table 2.1 was not known to us and the 
only information we had was a randomly selected sample of Y values for the fixed X’s 
as given in Table 2.4. Unlike Table 2.1, we now have only one Y value corresponding to 
the given X’s; each Y (given X) in Table 2.4 is chosen randomly from similar F’s 
corresponding to the same X, from the population of Table 2.1. 


"Milton Friedman, A Theory of the Consumption Function, Princeton University Press, Princeton, N.J., 
1957. 

12 "That descriptions be kept as simple as possible until proved inadequate," The World of Mathematics, 
vol. 2, J. R. Newman (ed.), Simon & Schuster, New York, 1956, p. 1247, or, "Entities should not be 
multiplied beyond necessity," Donald F. Morrison, Applied Linear Statistical Methods, Prentice Hall, 
Englewood Cliffs, N.J., 1983, p. 58. 
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The question is: From the sample of Table 2.4 can we predict the average weekly con¬ 
sumption expenditure Y in the population as a whole corresponding to the chosen X’s? In 
other words, can we estimate the PRF from the sample data? As the reader surely suspects, 
we may not be able to estimate the PRF “accurately” because of sampling fluctuations. To 
see this, suppose we draw another random sample from the population of Table 2.1, as 
presented in Table 2.5. 

Plotting the data of Tables 2.4 and 2.5, we obtain the scattergram given in Figure 2.4. In 
the scattergram two sample regression lines are drawn so as to “fit” the scatters reasonably 
well: SRFi is based on the first sample, and SRF 2 is based on the second sample. Which of 
the two regression lines represents the “true” population regression line? If we avoid the 
temptation of looking at Figure 2.1, which purportedly represents the PR, there is no way 
we can be absolutely sure that either of the regression lines shown in Figure 2.4 represents 
the true population regression line (or curve). The regression lines in Figure 2.4 are known 


TABLE 2.4 

A Random Sample from the 
Population of Table 2.1 

/ X 

70 80 

65 100 

90 120 

95 140 

110 160 

115 180 

120 200 

140 220 

155 240 

150 260 


TABLE 2.5 

Another Random Sample from the 
Population of Table 2.1 

Y X 

55 80 

88 100 

90 120 

80 140 

118 160 

120 180 

145 200 

135 220 

145 240 

175 260 
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as the sample regression lines. Supposedly they represent the population regression line, 
but because of sampling fluctuations they are at best an approximation of the true PR. In 
general, we would get N different SRFs for N different samples, and these SRFs are not 
likely to be the same. 

Now, analogously to the PRF that underlies the population regression line, we can 
develop the concept of the sample regression function (SRF) to represent the sample 
regression line. The sample counterpart of Eq. (2.2.2) may be written as 

% = fa + PzXi (2.6.1) 

where 7 is read as “7-hat” or “7-cap” 

7, = estimator of E{Y \ X t ) 

Pi — estimator of fi\ 
f$2 — estimator of 

Note that an estimator, also known as a (sample) statistic, is simply a rule or formula or 
method that tells how to estimate the population parameter from the information provided by 
the sample at hand. A particular numerical value obtained by the estimator in an application 
is known as an estimate. 13 It should be noted that an estimator is random, but an estimate is 
nonrandom. (Why?) 

Now just as we expressed the PRF in two equivalent forms, Eq. (2.2.2) and Eq. (2.4.2), 
we can express the SRF in Equation 2.6.1 in its stochastic form as follows: 

Y i =p l + p 2 X i + u i (2.6.2) 

where, in addition to the symbols already defined, u t denotes the (sample) residual term. 
Conceptually ii, is analogous to w, and can be regarded as an estimate oft/,. It is introduced 
in the SRF for the same reasons as u, was introduced in the PRF. 

To sum up, then, we find our primary objective in regression analysis is to estimate the 
PRF 


Yi = Pi + faXi + ui 


(2.4.2) 


on the basis of the SRF 


Yi = Pi + hi + Ui ( 2 . 6 . 2 ) 

because more often than not our analysis is based upon a single sample from some popula¬ 
tion. But because of sampling fluctuations, our estimate of the PRF based on the SRF is at 
best an approximate one. This approximation is shown diagrammatically in Figure 2.5. 


13 As noted in the Introduction, a hat above a variable will signify an estimator of the relevant 
population value. 


FIGURE 2.5 

Sample and population 
regression lines. 
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For X — Xj, we have one (sample) observation, Y = Y t . In terms of the SRF, the 
observed Y t can be expressed as 


Y t = % + u t (2.6.3) 

and in terms of the PRF, it can be expressed as 

'% = E(Y | Xi) + u t (2.6.4) 

Now obviously in Figure 2.5 7, overestimates the true E( Y \ X, ) for the X, shown therein. 
By the same token, for any Xj to the left of the point A, the SRF will underestimate the true 
PRF. But the reader can readily see that such over- and underestimation is inevitable 
because of sampling fluctuations. 

The critical question now is: Granted that the SRF is but an approximation of the PRF, 
can we devise a rule or a method that will make this approximation as “close” as possible? 
In other words, how should the SRF be constructed so that fa is as “close” as possible to 
the true fa and fa is as “close” as possible to the true fa even though we will never know 
the true fa and fal 

The answer to this question will occupy much of our attention in Chapter 3. We note 
here that we can develop procedures that tell us how to construct the SRF to mirror the PRF 
as faithfully as possible. It is fascinating to consider that this can be done even though we 
never actually determine the PRF itself. 


2.7 Illustrative Examples 


We conclude this chapter with two examples. 
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EXAMPLE 2.1 

Mean Hourly 
Wage by 
Education 


Table 2.6 gives data on the level of education (measured by the number of years of school¬ 
ing), the mean hourly wages earned by people at each level of education, and the number 
of people at the stated level of education. Ernst Berndt originally obtained the data 
presented in the table, and he derived these data from the population survey conducted 
in May 1985. 14 

Plotting the (conditional) mean wage against education, we obtain the picture in 
Figure 2.6. The regression curve in the figure shows how mean wages vary with the level 
of education; they generally increase with the level of education, a finding one should not 
find surprising. We will study in a later chapter how variables besides education can also 
affect the mean wage. 


TABLE 2.6 

Mean Hourly Wage 
by Education 

Source: Arthur S. 
Goldberger, Introductory 

University Press, Cambridge, 
Mass., 1998, Table 1.1, p. 5 
(adapted). 


Years of Schooling 

6 

7 

8 
9 

10 

11 

12 

13 

14 

15 

16 

17 

18 


Mean Wage, $ 
4.4567 
5.7700 
5.9787 
7.3317 
7.3182 
6.5844 
7.8182 
7.8351 
11.0223 
10.6738 
10.8361 
13.6150 
13.5310 


Number of People 

3 

5 

15 

12 

17 

27 

218 

37 

56 

13 

70 

24 

- 

Total 528 



14 Ernst R. Berndt, The Practice of Econometrics: Classic and Contemporary, Addison Wesley, Reading, 
Mass., 1991. Incidentally, this is an excellent book that the reader may want to read to find out how 
econometricians go about doing research. 
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EXAMPLE 2.2 

Mathematics SAT 
Scores by Family 
Income 


Table 2.10 in Exercise 2.17 provides data on mean SAT (Scholastic Aptitude Test) scores on 
critical reading, mathematics, and writing for college-bound seniors based on 947,347 
students taking the SAT examination in 2007. Plotting the mean mathematics scores on 
mean family income, we obtain the picture in Figure 2.7. 

Note: Because of the open-ended income brackets for the first and last income 
categories shown in Table 2.10, the lowest average family income is assumed to be 
$5,000 and the highest average family income is assumed to be $150,000. 


FIGURE 2.7 

Relationship between 
mean mathematics 
SAT scores and mean 
family income. 



Average family income, $ 


As Figure 2.7 shows, the average mathematics score increases as average family 
income increases. Since the number of students taking the SAT examination is quite 
large, it probably represents the entire population of seniors taking the examination. 
Therefore, the regression line sketched in Figure 2.7 probably represents the population 
regression line. 

There may be several reasons for the observed positive relationship between the two 
variables. For example, one might argue that students with higher family income can 
better afford private tutoring for the SAT examinations. In addition, students with higher 
family income are more likely to have parents who are highly educated. It is also possible 
that students with higher mathematics scores come from better schools. The reader can 
provide other explanations for the observed positive relationship between the two 
variables. 
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Summary and 
Conclusions 


EXERCISES 


1. The key concept underlying regression analysis is the concept of the conditional 
expectation function (CEF), or population regression function (PRF). Our objective 
in regression analysis is to find out how the average value of the dependent variable 
(or regressand) varies with the given value of the explanatory variable (or regressor). 

2. This book largely deals with linear PRFs, that is, regressions that are linear in the 
parameters. They may or may not be linear in the regressand or the regressors. 

3. For empirical purposes, it is the stochastic PRF that matters. The stochastic 
disturbance term u, plays a critical role in estimating the PRF. 

4. The PRF is an idealized concept, since in practice one rarely has access to the entire 
population of interest. Usually, one has a sample of observations from the population. 
Therefore, one uses the stochastic sample regression function (SRF) to estimate the 
PRF. Flow this is actually accomplished is discussed in Chapter 3. 


Questions 

2.1. What is the conditional expectation function or the population regression function? 

2.2. What is the difference between the population and sample regression functions? Is 
this a distinction without difference? 

2.3. What is the role of the stochastic error term n, in regression analysis? What is the 
difference between the stochastic error term and the residual, up. 

2.4. Why do we need regression analysis? Why not simply use the mean value of the 
regressand as its best value? 

2.5. What do we mean by a linear regression model? 

2.6. Determine whether the following models are linear in the parameters, or the 
variables, or both. Which of these models are linear regression models? 


Model 

Descriptive Title 

a. Yi=Pi+P2^+Ui 

Reciprocal 

b. Yj = Pi + P2 In Xj + Uj 

Semilogarithmic 

c. In Yj = Pi + PiXj + Uj 

Inverse semilogarithmic 

d. In Yj = In jSt + p 2 In Xj + Uj 

Logarithmic or double logarithmic 

e. In Yj =p\-p 2 Q J + Uj 

Logarithmic reciprocal 




2.7. Are the following models linear regression models? Why or why not? 

a. Yj = eP 1+ P 2X,+Ui 

h Y -- 1 

' i + efh+fhxt+ut 

c. In Y i =p 1 +p 2 (J^j+u i 

d. Yi=p i + (0.75 - Pi)e~ h(Xi - 2) + m 

e. Yi=p!+ plXt + Ui 
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2.8. What is meant by an intrinsically linear regression model? If fij in Exercise 2.7d 
were 0.8, would it be a linear or nonlinear regression model? 

2.9. Consider the following nonstochastic models (i.e., models without the stochastic 
error term). Are they linear regression models? If not, is it possible, by suitable 
algebraic manipulations, to convert them into linear models? 


Pi + PiXi 


b. Yi = 


Xi 

Pi + PiXi 


c. Yi = 


1+exp (-Pi-p 2 Xi) 


2.10. You are given the scattergram in Figure 2.8 along with the regression line. What 
general conclusion do you draw from this diagram? Is the regression line sketched in 
the diagram a population regression line or the sample regression line? 

2.11. From the scattergram given in Figure 2.9, what general conclusions do you draw? 
What is the economic theory that underlies this scattergram? {Hint: Fook up any 
international economics textbook and read up on the Heckscher-Ohlin model of 
trade.) 

2.12. What does the scattergram in Figure 2.10 reveal? On the basis of this diagram, would 
you argue that minimum wage laws are good for economic well-being? 

2.13. Is the regression line shown in Figure 1.3 of the Introduction the PRF or the SRF? 
Why? How would you interpret the scatterpoints around the regression line? Besides 
GDP, what other factors, or variables, might determine personal consumption 
expenditure? 


FIGURE 2.8 

Growth rates of real 
manufacturing wages 
and exports. Data are 
for 50 developing 
countries during 
1970-90. 

Development Report 1995, 
p. 55. The original source is 
UNIDO data, World Bank data. 



♦ East Asia and the Pacific ♦ South Asia 

♦ Latin America and the Caribbean ♦ Sub-Saharan Africa 

♦ Middle East and North Africa 
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FIGURE 2.9 

Skill intensity of 
exports and human 
capital endowment. 
Data are for 126 
industrial and 
developing countries 
in 1985. Values along 
the horizontal axis are 
logarithms of the ratio 
of the country’s 
average educational 
attainment to its land 
area; vertical axis 
values are logarithms 
of the ratio of 
manufactured to 
primary-products 
exports. 

Source: World Bank, World 
Development Report 1995, 
p. 59. Original sources: Export 
data from United Nations 
Statistical Office COMTRADE 
database; education data from 

World Bank. 



FIGURE 2.10 

The minimum wage 
and GNP per capita. 
The sample consists of 
17 developing 
countries. Years vary 
by country from 1988 
to 1992. Data are in 
international prices. 

Development Report 1995, 


Ratio of one year's salary at 
minimum wage to GNP per capita 



GNP per capita (thousands of dollars) 


Empirical Exercises 

2.14. You are given the data in Table 2.7 for the United States for years 1980-2006. 

a. Plot the male civilian labor force participation rate against male civilian unemploy¬ 
ment rate. Eyeball a regression line through the scatter points. A priori, what is the ex¬ 
pected relationship between the two and what is the underlying economic theory? 
Does the scattergram support the theory? 
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TABLE 2.7 

Year 

CLFPRM 1 

CLFPRF 2 

UNRM 3 

UNRF 4 

AHE82 5 

AHE 6 

Participation Data 

1980 

77.40000 

51.50000 

6.900000 

7.400000 

7.990000 

6.840000 

for U.S. for 

1981 

77.00000 

52.10000 

7.400000 

7.900000 

7.880000 

7.430000 

1980-2006 

1982 

76.60000 

52.60000 

9.900000 

9.400000 

7.860000 

7.860000 


1983 

76.40000 

52.90000 

9.900000 

9.200000 

7.950000 

8.190000 

Source: Economic Report of the 

1984 

76.40000 

53.60000 

7.400000 

7.600000 

7.950000 

8.480000 


1985 

76.30000 

54.50000 

7.000000 

7.400000 

7.910000 

8.730000 


1986 

76.30000 

55.30000 

6.900000 

7.100000 

7.960000 

8.920000 


1987 

76.20000 

56.00000 

6.200000 

6.200000 

7.860000 

9.1 30000 


1988 

76.20000 

56.60000 

5.500000 

5.600000 

7.810000 

9.430000 


1989 

76.40000 

57.40000 

5.200000 

5.400000 

7.750000 

9.800000 


1990 

76.40000 

57.50000 

5.700000 

5.500000 

7.660000 

10.190000 


1991 

75.80000 

57.40000 

7.200000 

6.400000 

7.580000 

10.500000 


1992 

75.80000 

57.80000 

7.900000 

7.000000 

7.550000 

10.760000 


1993 

75.40000 

57.90000 

7.200000 

6.600000 

7.520000 

11.030000 


1994 

75.10000 

58.80000 

6.200000 

6.000000 

7.530000 

11.320000 


1995 

75.00000 

58.90000 

5.600000 

5.600000 

7.530000 

11.640000 


1996 

74.90000 

59.30000 

5.400000 

5.400000 

7.570000 

12.030000 


1997 

75.00000 

59.80000 

4.900000 

5.000000 

7.680000 

12.490000 


1998 

74.90000 

59.80000 

4.400000 

4.600000 

7.890000 

13.000000 


1999 

74.70000 

60.00000 

4.100000 

4.300000 

8.000000 

13.470000 


2000 

74.80000 

59.90000 

3.900000 

4.100000 

8.030000 

14.000000 


2001 

74.40000 

59.80000 

4.800000 

4.700000 

8.110000 

14.530000 


2002 

74.10000 

59.60000 

5.900000 

5.600000 

8.240000 

14.950000 


2003 

73.50000 

59.50000 

6.300000 

5.700000 

8.270000 

15.350000 


2004 

73.30000 

59.20000 

5.600000 

5.400000 

8.230000 

15.670000 


2005 

73.30000 

59.30000 

5.100000 

5.100000 

8.1 70000 

16.110000 


2006 

73.50000 

59.40000 

4.600000 

4.600000 

8.230000 

16.730000 


Table citations 1 

below refer to the si 

rurce document. 






1 CLFPRM, Civ 

ilian labor force participation rate, male 

5 (%), Table B-39,p. 

277. 




2 CLFPRF, Civil 

iian labor force participation rate, femal 

e (%), Table B-39, p 

i. 277. 




3 UNRM, Civilii 

an unemployment r, 

ate, male (%) Table ] 

B-42, p. 280. 





4 UNRF, Civiliai 

i unemployment ra 

te, female (%) Table 

B-42, p. 280. 





5 AHE82,Avera 

ge hourly earnings (1982 dollars), Table 

; B-47, p. 286. 





6 AHE, Average 

hourly earnings (ci 

irrent dollars). Table 

B-47, p. 286. 





b. Repeat (a) for females. 

c. Now plot both the male and female labor participation rates against average hourly 
earnings (in 1982 dollars). (You may use separate diagrams.) Now what do you find? 
And how would you rationalize your finding? 

d. Can you plot the labor force participation rate against the unemployment rate and 
the average hourly earnings simultaneously? If not, how would you verbalize the 
relationship among the three variables? 

2.15. Table 2.8 gives data on expenditure on food and total expenditure, measured in 
rupees, for a sample of 55 rural households from India. (In early 2000, a U.S. dollar 
was about 40 Indian rupees.) 

a. Plot the data, using the vertical axis for expenditure on food and the horizontal axis for 
total expenditure, and sketch a regression line through the scatterpoints. 

b. What broad conclusions can you draw from this example? 







52 Part One Single-Equation Regression Models 


TABLE 2.8 Food and Total Expenditure (Rupees) 



Food 

Total 


Food 

Total 

Observation 

Expenditure 

Expenditure 

Observation 

Expenditure 

Expenditure 

1 

21 7.0000 

382.0000 

29 

390.0000 

655.0000 

2 

196.0000 

388.0000 

30 

385.0000 

662.0000 

3 

303.0000 

391.0000 

31 

470.0000 

663.0000 

4 

270.0000 

415.0000 

32 

322.0000 

677.0000 

5 

325.0000 

456.0000 

33 

540.0000 

680.0000 

6 

260.0000 

460.0000 

34 

433.0000 

690.0000 

7 

300.0000 

472.0000 

35 

295.0000 

695.0000 

8 

325.0000 

478.0000 

36 

340.0000 

695.0000 

9 

336.0000 

494.0000 

37 

500.0000 

695.0000 

10 

345.0000 

516.0000 

38 

450.0000 

720.0000 

11 

325.0000 

525.0000 

39 

415.0000 

721.0000 

12 

362.0000 

554.0000 

40 

540.0000 

730.0000 

13 

315.0000 

575.0000 

41 

360.0000 

731.0000 

14 

355.0000 

579.0000 

42 

450.0000 

733.0000 

15 

325.0000 

585.0000 

43 

395.0000 

745.0000 

16 

370.0000 

586.0000 

44 

430.0000 

751.0000 

17 

390.0000 

590.0000 

45 

332.0000 

752.0000 

18 

420.0000 

608.0000 

46 

397.0000 

752.0000 

19 

410.0000 

610.0000 

47 

446.0000 

769.0000 

20 

383.0000 

616.0000 

48 

480.0000 

773.0000 

21 

315.0000 

618.0000 

49 

352.0000 

773.0000 

22 

267.0000 

623.0000 

50 

410.0000 

775.0000 

23 

420.0000 

627.0000 

51 

380.0000 

785.0000 

24 

300.0000 

630.0000 

52 

610.0000 

788.0000 

25 

410.0000 

635.0000 

53 

530.0000 

790.0000 

26 

220.0000 

640.0000 

54 

360.0000 

795.0000 

27 

403.0000 

648.0000 

55 

305.0000 

801.0000 

28 

350.0000 

650.0000 







c. A priori, would you expect expenditure on food to increase linearly as total expendi¬ 
ture increases regardless of the level of total expenditure? Why or why not? You can 
use total expenditure as a proxy for total income. 

2.16. Table 2.9 gives data on mean Scholastic Aptitude Test (SAT) scores for college- 
bound seniors for 1972-2007. These data represent the critical reading and mathe¬ 
matics test scores for both male and female students. The writing category was 
introduced in 2006. Therefore, these data are not included. 

a. Use the horizontal axis for years and the vertical axis for SAT scores to plot the critical 
reading and math scores for males and females separately. 

b. What general conclusions do you draw from these graphs? 

c. Knowing the critical reading scores of males and females, how would you go about 
predicting their math scores? 

d. Plot the female math scores against the male math scores. What do you observe? 





Chapter 2 Two- Variable Regression Analysis: Some Basic Ideas 53 


TABLE 2.9 



Critical Reading 



Mathematics 


Year 

Male 

Female 

Total 

Male 

Female 

Total 

SAT Reasoning Test 

Scores: College- 

1972 

531 

529 

530 

527 

489 

509 

Bound Seniors, 

1973 

523 

521 

523 

525 

489 

506 

1972-2007 

1974 

524 

520 

521 

524 

488 

505 

Source: College Board, 2007. 

1975 

515 

509 

512 

518 

479 

498 

1976 

511 

508 

509 

520 

475 

497 


1977 

509 

505 

507 

520 

474 

496 


1978 

511 

503 

507 

517 

474 

494 


1979 

509 

501 

505 

516 

473 

493 


1980 

506 

498 

502 

515 

473 

492 


1981 

508 

496 

502 

516 

473 

492 


1982 

509 

499 

504 

516 

473 

493 


1983 

508 

498 

503 

516 

474 

494 


1984 

511 

498 

504 

518 

478 

497 


1985 

514 

503 

509 

522 

480 

500 


1986 

515 

504 

509 

523 

479 

500 


1987 

512 

502 

507 

523 

481 

501 


1988 

512 

499 

505 

521 

483 

501 


1989 

510 

498 

504 

523 

482 

502 


1990 

505 

496 

500 

521 

483 

501 


1991 

503 

495 

499 

520 

482 

500 


1992 

504 

496 

500 

521 

484 

501 


1993 

504 

497 

500 

524 

484 

503 


1994 

501 

497 

499 

523 

487 

504 


1995 

505 

502 

504 

525 

490 

506 


1996 

507 

503 

505 

527 

492 

508 


1997 

507 

503 

505 

530 

494 

511 


1998 

509 

502 

505 

531 

496 

512 


1999 

509 

502 

505 

531 

495 

511 


2000 

507 

504 

505 

533 

498 

514 


2001 

509 

502 

506 

533 

498 

514 


2002 

507 

502 

504 

534 

500 

516 


2003 

512 

503 

507 

537 

503 

519 


2004 

512 

504 

508 

537 

501 

518 


2005 

513 

505 

508 

538 

504 

520 


2006 

505 

502 

503 

536 

502 

518 


2007 

504 

502 

502 

533 

499 

515 


Note: For 1972- 

-1986 a formuli 

a was applied to the original mi 

san and standard deviation to co 

nvert the mean to the recentere 

d scale. For 


1987-1995 indi’ 

ridual student s 

scores were converted to the recentered scale and then the meai 

i was recomputed. From 1996-1999, nearly 


aU students rece 

ived scores on 

the recentered scale. Any score 

: on the original scale 

was conve 

rted to the recentered scale pri. 

arto 


computing the mean. From 2000-2007, all scores are reported on the recentered scale. 


2.17. Table 2.10 presents data on mean SAT reasoning test scores classified by income for 
three kinds of tests: critical reading, mathematics, and writing. In Example 2.2, we 
presented Figure 2.7, which plotted mean math scores on mean family income. 
a. Refer to Figure 2.7 and prepare a similar graph relating average critical reading scores 
to average family income. Compare your results with those shown in Figure 2.7. 
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TABLE 2.10 

SAT Reasoning Test 
Classified by Family 

Family 

Number of 

Critical Reading 

Mathematics 

Writing 

Income ($) 

Test Takers 

Mean 

SD 

Mean 

SD 

Mean 

SD 

Income 

<10,000 

40610 

427 

107 

451 

122 

423 

104 

Source: College Board, 2007 

10000-20000 

72745 

453 

106 

472 

113 

446 

102 

College-Bound Seniors, 

20000-30000 

61244 

454 

102 

465 

107 

444 

97 

Table 11. 

30000-40000 

83685 

476 

103 

485 

106 

466 

98 


40000-50000 

75836 

489 

103 

486 

105 

477 

99 


50000-60000 

80060 

497 

102 

504 

104 

486 

98 


60000-70000 

75763 

504 

102 

511 

103 

493 

98 


70000-80000 

81627 

508 

101 

516 

103 

498 

98 


80000-100000 

130752 

520 

102 

529 

104 

510 

100 


>100000 

245025 

544 

105 

556 

107 

537 

103 


b. Repeat (a), relating average writing scores to average family income and compare your 
results with the other two graphs. 

c. Looking at the three graphs, what general conclusion can you draw? 









Chapter 


3 


Two-Variable 
Regression Model: The 
Problem of Estimation 


As noted in Chapter 2, our first task is to estimate the population regression function (PRF) 
on the basis of the sample regression function (SRF) as accurately as possible. In Appendix A 
we have discussed two generally used methods of estimation: (1) ordinary least squares 
(OLS) and (2) maximum likelihood (ML). By and large, it is the method of OLS that is used 
extensively in regression analysis primarily because it is intuitively appealing and mathe¬ 
matically much simpler than the method of maximum likelihood. Besides, as we will show 
later, in the linear regression context the two methods generally give similar results. 


3.1 The Method of Ordinary Least Squares 


The method of ordinary least squares is attributed to Carl Friedrich Gauss, a German math¬ 
ematician. Under certain assumptions (discussed in Section 3.2), the method of least 
squares has some very attractive statistical properties that have made it one of the most 
powerful and popular methods of regression analysis. To understand this method, we first 
explain the least-squares principle. 

Recall the two-variable PRF: 


Y i = p l +p 2 X i + u i 


(2.4.2) 


However, as we noted in Chapter 2, the PRF is not directly observable. We estimate it from 
the SRF: 



( 2 . 6 . 2 ) 

(2.6.3) 


where % is the estimated (conditional mean) value of F,. 

But how is the SRF itself determined? To see this, let us proceed as follows. First, 
express Equation 2.6.3 as 


Ui = Yi — Yj 

= Yi-p i - faXi 


(3.1.1) 
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FIGURE 3.1 

Least-squares 

criterion. 



which shows that the u, (the residuals) are simply the differences between the actual and 
estimated Y values. 

Now given n pairs of observations on Y and X, we would like to determine the SRF in 
such a manner that it is as close as possible to the actual Y. To this end, we may adopt the 
following criterion: Choose the SRF in such a way that the sum of the residuals 
Y Ui = ~ is as small as possible. Although intuitively appealing, this is not a 

very good criterion, as can be seen in the hypothetical scattergram shown in Figure 3.1. 

If we adopt the criterion of minimizing Y “;> Figure 3.1 shows that the residuals «2 
and ut, as well as the residuals u\ and ua receive the same weight in the sum 
(u i + U2 + «3 + U4), although the first two residuals are much closer to the SRF than the 
latter two. In other words, all the residuals receive equal importance no matter how close or 
how widely scattered the individual observations are from the SRF. A consequence of this 
is that it is quite possible that the algebraic sum of the u, is small (even zero) although the 
m, are widely scattered about the SRF. To see this, let u\, U2, M 3 , and 114 in Figure 3.1 
assume the values of 10, -2, +2, and -10, respectively. The algebraic sum of these resid¬ 
uals is zero although u\ and U4 are scattered more widely around the SRF than 112 and M3. 
We can avoid this problem if we adopt the least-squares criterion, which states that the SRF 
can be fixed in such a way that 

_ . . 2 (3.1.2) 

= E^ - a - /w 2 

is as small as possible, where uj are the squared residuals. By squaring m,, this method 
gives more weight to residuals such as u 1 and M4 in Figure 3.1 than the residuals M2 and M3. 
As noted previously, under the minimum Y] criterion, the sum can be small even though 
the m, are widely spread about the SRF. But this is not possible under the least-squares pro¬ 
cedure, for the larger the m, (in absolute value), the larger the Y « /• A further justification 
for the least-squares method lies in the fact that the estimators obtained by it have some 
very desirable statistical properties, as we shall see shortly. 
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TABLE 3.1 
Experimental 
Determination of 
the SRF 


Y, 

x t 

Yu 

tiy 

til 

Y 2i 

tin 

til 

(1) 

(2) 

(3) 

(4) 

(5) 

(6) 

(7) 

(8) 

4 

1 

2.929 

1.071 

1.147 

4 

0 

0 

5 

4 

7.000 

-2.000 

4.000 

7 

-2 

4 

7 

5 

8.357 

-1.357 

1.841 

8 

-1 

1 

12 

6 

9.714 

2.286 

5.226 

9 

3 

9 

Sum: 28 

16 


0.0 

12.214 


0 

14 


Notes: tu = 1.572 H- 1.357JS5 (i.e., ft = 1.572 and A = 1.357) 


It is obvious from Equation 3.1.2 that 

= /(&,&) (3.1.3) 

that is, the sum of the squared residuals is some function of the estimators P\ and f>2- For 
any given set of data, choosing different values for f>\ and P2 will give different u ’s and 
hence different values of Y u \. To see this clearly, consider the hypothetical data on Y and 
X given in the first two columns of Table 3.1. Let us now conduct two experiments. In 
experiment 1, let f}\ — 1.572 and f} 2 — 1.357 (let us not worry right now about how we got 
these values; say, it is just a guess). 1 Using these P values and the X values given in column (2) 
of Table 3.1, we can easily compute the estimated Y t given in column (3) of the table as Y\ l 
(the subscript 1 is to denote the first experiment). Now let us conduct another experiment, 
but this time using the values of f5\ — 3 and $2 = 1 ■ The estimated values of T, from this 
experiment are given as Y 2 j in column (6) of Table 3.1. Since the P values in the two 
experiments are different, we get different values for the estimated residuals, as shown in 
the table; u\, are the residuals from the first experiment and u 2 , from the second experi¬ 
ment. The squares of these residuals are given in columns (5) and (8). Obviously, as 
expected from Equation 3.1.3, these residual sums of squares are different since they are 
based on different sets of p values. 

Now which sets of p values should we choose? Since the P values of the first experiment 
give us a lower Y m? (= 12.214) than that obtained from the p values of the second experi¬ 
ment (= 14), we might say that the P’s of the first experiment are the “best” values. But how 
do we know? For, if we had infinite time and infinite patience, we could have conducted 
many more such experiments, choosing different sets of P ’s each time and comparing the re¬ 
sulting Y u] and then choosing that set of P values that gives us the least possible value of 
Y assuming of course that we have considered all the conceivable values of P\ and p 2 . 
But since time, and certainly patience, are generally in short supply, we need to consider 
some shortcuts to this trial-and-error process. Fortunately, the method of least squares pro¬ 
vides us such a shortcut. The principle or the method of least squares chooses Pi and $2 
in such a manner that, for a given sample or set of data, Y is as small as possible. In other 
words, for a given sample, the method of least squares provides us with unique estimates of 
Pi and P2 that give the smallest possible value of Y « /• Flow is this accomplished? This is a 


fror the curious, these values are obtained by the method of least squares, discussed shortly. See 
Eqs. (3.1.6) and (3.1.7). 





58 Part One Single-Equation Regression Models 


straightforward exercise in differential calculus. As shown in Appendix 3A, Section 3A.1, 
the process of differentiation yields the following equations for estimating and p 2 - 


!>=«! 1 +&£>,• ( 3 . 1 . 4 ) 

J2 Yi*i = J2 Xi + & £ X \ ( 3 . 1 . 5 ) 


where n is the sample size. These simultaneous equations are known as the normal 
equations. 

Solving the normal equations simultaneously, we obtain 



where X and Y are the sample means of X and Y and where we define x,- = (A, — X) and 
yi = (Yj — Y). Henceforth, we adopt the convention of letting the lowercase letters denote 
deviations from mean values. 



The last step in Equation 3.1.7 can be obtained directly from Eq. (3.1.4) by simple alge¬ 
braic manipulations. 

Incidentally, note that, by making use of simple algebraic identities, formula (3.1.6) for 
estimating fi 2 can be alternatively expressed as 


h = 


Xu* 

n 


Exi Y i 

ZXl-nX 2 


Y J X}-nX 2 


( 3 . 1 . 8) 2 


2 Note 1: E xf = £(X, _ X) 2 = £ X? - 2 £ X, X + £ X 2 = £ X? - 2X £ X, + £ X 2 , since X 
is a constant. Further noting that £ X,- = nX and £ X 2 = nX 2 since X is a constant, we finally get 

Zxf = j:Xf-nX 2 - 

Note 2: £ x, y, = £ x/(V, - Y ) = £ x; /,-?£ x, = £ x, Y, - Y £(X ; - X) = £ x, V„ since Y is a 
constant and since the sum of deviations of a variable from its mean value [e.g., £(X; — X)] is always 
zero. Likewise, £ y, = £(/, - Y) = 0. 
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The estimators obtained previously are known as the least-squares estimators, for they 
are derived from the least-squares principle. Note the following numerical properties of 
estimators obtained by the method of OLS: “Numerical properties are those that hold as a 
consequence of the use of ordinary least squares, regardless of how the data were gener¬ 
ated.” 3 Shortly, we will also consider the statistical properties of OLS estimators, that is, 
properties “that hold only under certain assumptions about the way the data were gener¬ 
ated.” 4 (See the classical linear regression model in Section 3.2.) 

I. The OLS estimators are expressed solely in terms of the observable (i.e., sample) quan¬ 
tities (i.e., X and Y). Therefore, they can be easily computed. 

II. They are point estimators; that is, given the sample, each estimator will provide only 
a single (point) value of the relevant population parameter. (In Chapter 5 we will 
consider the so-called interval estimators, which provide a range of possible values 
for the unknown population parameters.) 

III. Once the OLS estimates are obtained from the sample data, the sample regression line 
(Figure 3.1) can be easily obtained. The regression line thus obtained has the follow¬ 
ing properties: 

1. It passes through the sample means of Y and X. This fact is obvious from 
Eq. (3.1.7), for the latter can be written as Y — fix + fcX, which is shown 
diagrammatically in Figure 3.2. 


FIGURE 3.2 

Diagram showing that 
the sample regression 
line passes through the 
sample mean values of 
Y and X. 


Y 



3 Russell Davidson and James C. MacKinnon, Estimation and Inference in Econometrics, Oxford 
University Press, New York, 1993, p. 3. 

A lbid. 
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2. The mean value of the estimated Y = Y, is equal to the mean value of the actual Y 
for 


Yi=fr +PiXi 

= (Y - p 2 X) + faXi (3.1.9) 

= ? + p 2 (X i -X) 

Summing both sides of this last equality over the sample values and dividing 
through by the sample size n gives 

Y-f (3.1.10) 5 

where use is made of the fact that ^(X, — X) — 0. (Why?) 

3. The mean value of the residuals u, is zero. From Appendix 3 A, Section 3A.1, the 
first equation is 

-2 J2W ~ Pi ~ PiXt) = 0 

But since u, — Y t — f}\ — foXj, the preceding equation reduces to —2 u, — 0, 
whence u = 0. 6 

As a result of the preceding property, the sample regression 

Yi = p x + p 2 X t + Ui (2.6.2) 

can be expressed in an alternative form where both Y and Ware expressed as devia¬ 
tions from their mean values. To see this, sum (2.6.2) on both sides to give 

5> = »a+&!>/+!> 

. „ _ „ ( 3 . 1 . 11 ) 

= nj3 1 + fe y, X, since y = 0 

Dividing Equation 3.1.11 through by n, we obtain 

Y = fr+ frX (3.1.12) 

which is the same as Eq. (3.1.7). Subtracting Equation 3.1.12 from Eq. (2.6.2), we 
obtain 


Yi-Y = p 2 {Xi-X) + Ui 
or 

y i = Pix i +u i (3.1.13) 

where y, and x,-, following our convention, are deviations from their respective 
(sample) mean values. 


5 Note that this result is true only when the regression model has the intercept term pi in it. As 
Appendix 6A, Sec. 6A.1 shows, this result need not hold when pi is absent from the model. 

6 This result also requires that the intercept term pi be present in the model (see Appendix 6A, 

Sec. 6A.1). 
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Equation 3.1.13 is known as the deviation form. Notice that the intercept term 
Pi is no longer present in it. But the intercept term can always be estimated by 
Eq. (3.1.7), that is, from the fact that the sample regression line passes through 
the sample means of Y and X. An advantage of the deviation form is that it often 
simplifies computing formulas. 

In passing, note that in the deviation form, the SRF can be written as 

If = Pixt (3.1.14) 

whereas in the original emits of measurement it was % = fi\ + PiX t , as shown in 
Eq. (2.6.1). 

4. The residuals u, are uncorrelated with the predicted Y t . This statement can be verified 
as follows: using the deviation form, we can write 

I>«<=&!>■“* 

= &£> g%-&*) 

= (3.1.15) 

= pl £ *?-$£>,? 

= 0 

where use is made of the fact that p 2 = J2 MYi /J2 x f- 

5. The residuals u, are uncorrelated with Xp, that is, it , Xj — 0. This fact follows 

from Eq. (2) in Appendix 3 A, Section 3A. 1. 

3.2 The Classical Linear Regression Model: The Assumptions 
Underlying the Method of Least Squares 

If our objective is to estimate P\ and p 2 only, the method of OLS discussed in the preceding 
section will suffice. But recall from Chapter 2 that in regression analysis our objective is not 
only to obtain fi\ and p 2 but also to draw inferences about the true pi and p 2 . For example, 
we would like to know how close pi and p 2 are to their counterparts in the population or 
how close Y l is to the true E(Y | X,). To that end, we must not only specify the functional 
form of the model, as in Eq. (2.4.2), hut also make certain assumptions about the manner 
in which Y, are generated. To see why this requirement is needed, look at the PRF: 
Y = Pi + p 2 Xi + u i . It shows that Y t depends on both X, and u ,. Therefore, unless we are 
specific about how X, and u, are created or generated, there is no way we can make any 
statistical inference about the Y t and also, as we shall see, about P\ and p 2 . Thus, the 
assumptions made about the X, variable(s) and the error term are extremely critical to the 
valid interpretation of the regression estimates. 

The Gaussian, standard, or classical linear regression model (CLRM), which is 
the cornerstone of most econometric theory, makes 7 assumptions. 7 We first discuss these 
assumptions in the context of the two-variable regression model; and in Chapter 7 we 
extend them to multiple regression models, that is, models in which there is more than one 
regressor. 

7 lt is classical in the sense that it was developed first by Gauss in 1821 and since then has served as a 
norm or a standard against which may be compared the regression models that do not satisfy the 
Gaussian assumptions. 
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ASSUMPTION 1 Linear Regression Model: The regression model is linear in the parameters, 

though it may or may not be linear in the variables. That is the regression model as shown 
in Eq. (2.4.2): 

Y-,= 1 3i + fi 2 X, + m (2.4.2) 

As will be discussed in Chapter 7, this model can be extended to include more explanatory 
variables. 


We have already discussed model (2.4.2) in Chapter 2. Since linear-in-parameter 
regression models are the starting point of the CLRM, we will maintain this assumption for 
most of this book. 8 Keep in mind that the regressand Y and the regressor X may be 
nonlinear, as discussed in Chapter 2. 


ASSUMPTION 2 Fixed X Values or X Values Independent of the Error Term: Values taken by the 
regressor X may be considered fixed in repeated samples (the case of fixed regressor) or 
they may be sampled along with the dependent variable Y (the case of stochastic 
regressor). In the latter case, it is assumed that the X variable(s) and the error term are 
independent, that is, cov (X„ u/) = 0. 


This can be explained in terms of our example given in Table 2.1 (page 35). Consider the 
various Y populations corresponding to the levels of income shown in the table. Keeping 
the value of income X fixed, say, at level $80, we draw at random a family and observe its 
weekly family consumption Y as, say, $60. Still keeping X at $80, we draw at random 
another family and observe its Y value at $75. In each of these drawings (i.e., repeated 
sampling), the value of X is fixed at $80. We can repeat this process for all the X values 
shown in Table 2.1. As a matter of fact, the sample data shown in Tables 2.4 and 2.5 were 
drawn in this fashion. 

Why do we assume that the X values are nonstochastic? Given that, in most social 
sciences, data usually are collected randomly on both the Y and X variables, it seems natural 
to assume the opposite—that the X variable, like the Y variable, is also random or stochas¬ 
tic. But initially we assume that the Wvariable(s) is nonstochastic for the following reasons: 

First, this is done initially to simplify the analysis and to introduce the reader to the com¬ 
plexities of regression analysis gradually. Second, in experimental situations it may not be 
unrealistic to assume that the X values are fixed. For example, a farmer may divide his land 
into several parcels and apply different amounts of fertilizer to these parcels to see its effect 
on crop yield. Likewise, a department store may decide to offer different rates of discount on 
a product to see its effect on consumers. Sometimes we may want to fix the X values for a 
specific purpose. Suppose we are trying to find out the average weekly earnings of workers 
(T) with various levels of education (X), as in the case of the data given in Table 2.6. In this 
case, the X variable can be considered fixed or nonrandom. Third, as we show in Chap¬ 
ter 13, even if the X variables are stochastic, the statistical results of linear regression based 


8 However, a brief discussion of nonlinear-in-parameter regression models is given in Chapter 14 for 
the benefit of more advanced students. 
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on the case of fixed regressors are also valid when the JTs are random, provided that some 
conditions are met. One condition is that regressor X and the error term u, are independent. 
As James Davidson notes, “. . . this model [i.e., stochastic regressors] ‘mimics’ the fixed 
regressor model, and . . . many of the statistical properties of least squares in the fixed 
regressor model continue to hold.” 9 

For all these reasons, we will first discuss the (fixed-regressor) CLRM in considerable 
detail. However, in Chapter 13 we will discuss the case of stochastic regressors in some 
detail and point out the occasions where we need to consider the stochastic regressor 
models. Incidentally, note that if the X variable(s) is stochastic, the resulting model is called 
the neo-classical linear regression model (NLRM), 10 in contrast to the CLRM, where the 
Xs are treated as fixed or nonrandom. For discussion purposes, we will call the former the 
stochastic regressor model and the latter the fixed regressor model. 


ASSUMPTION 3 Zero Mean Value of Disturbance «,•: Given the value of X„ the mean, or expected, 
value of the random disturbance term u, is zero. Symbolically, we have 

E(ui\X!) = 0 (3.2.1) 


Or, if X is nonstochastic, 


F(u,) = 0 


Assumption 3 states that the mean value of u, conditional upon the given X t is zero. 
Geometrically, this assumption can be pictured as in Figure 3.3, which shows a few values 
of the variable X and the Y populations associated with each of them. As shown, each Y 


FIGURE 3.3 

Conditional 
distribution of the 
disturbances u t . 



9 James Davidson, Econometric Theory, Blackwell Publishers, U.K., 2000, p. 10. 

10 A term due to Arthur S. Goldberger, A Course in Econometrics, Harvard University Press, Cambridge, 
MA, 1991, p. 264. 
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population corresponding to a given Xis distributed around its mean value (shown by the cir¬ 
cled points on the PRF), with some Y values above the mean and some below it. The distances 
above and below the mean values are nothing but the u,. Equation 3.2.1 requires that the 
average or mean value of these deviations corresponding to any given X should he zero. 

This assumption should not be difficult to comprehend in view of the discussion in 
Section 2.4 (see Eq. [2.4.5]). Assumption 3 simply says that the factors not explicitly 
included in the model, and therefore subsumed in u it do not systematically affect the mean 
value of Y; in other words, the positive w, values cancel out the negative w, values so that 
their average or mean effect on Y is zero. 11 

In passing, note that the assumption E(u t \Xj) — 0 implies that E( Tj- \X t ) = fi\+ fa X t . 
(Why?) Therefore, the two assumption are equivalent. 

It is important to point out that Assumption 3 implies that there is no specification bias 
or specification error in the model used in empirical analysis. In other words, the regres¬ 
sion model is correctly specified. Leaving out important explanatory variables, including 
unnecessary variables, or choosing the wrong functional form of the relationship between 
the Y and X variables are some examples of specification error. We will discuss this topic in 
considerable detail in Chapter 13. 

Note also that if the conditional mean of one random variable given another random 
variable is zero, the covariance between the two variables is zero and hence the two vari¬ 
ables are uncorrelated. Assumption 3 therefore implies that X, and w, are uncorrelated. 12 

The reason for assuming that the disturbance term u and the explanatory variable(s) X 
are uncorrelated is simple. When we expressed the PRF as in Eq. (2.4.2), we assumed that 
X and u (which represent the influence of all omitted variables) have separate (and additive) 
influences on Y. But if X and u are correlated, it is not possible to assess their individual 
effects on Y. Thus, if X and u are positively correlated, X increases when u increases and 
decreases when u decreases. Similarly, if X and u are negatively correlated, X increases 
when u decreases and decreases when u increases. In situations like this it is quite possible 
that the error term actually includes some variables that should have been included as 
additional regressors in the model. This is why Assumption 3 is another way of stating that 
there is no specification error in the chosen regression model. 


ASSUMPTION 4 Homoscedasticity or Constant Variance of «,•: The variance of the error, or 
disturbance, term is the same regardless of the value of X. Symbolically, 
var(u,) = E[Uj - £(u,|X,)] 2 

= £(u 2 |X;), because of Assumption 3 
= E(uf), if X, are nonstochastic 

^ (3.2.2) 

where var stands for variance. 


"For a more technical reason why Assumption 3 is necessary see E. Malinvaud, Statistical Methods of 
Econometrics, Rand McNally, Chicago, 1966, p. 75. See also Exercise 3.3. 

12 The converse, however, is not true because correlation is a measure of linear association only. That 
is, even if X, and u, are uncorrelated, the conditional mean of u, given X, may not be zero. However, if 
X, and Ui are correlated, E(u,| X;) must be nonzero, violating Assumption 3. We owe this point to Stock 
and Watson. See James H. Stock and Mark W. Watson, Introduction to Econometrics, Addison-Wesley, 
Boston, 2003, pp. 104-105. 
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Equation 3.2.2 states that the variance of u, for each X l (i.e., the conditional variance of 
Uj) is some positive constant number equal to a 2 . Technically, Eq. (3.2.2) represents the 
assumption of homoscedasticity, or equal (homo) spread (scedasticity) or equal variance. 
The word comes from the Greek verb skedanime, which means to disperse or scatter. Stated 
differently, Eq. (3.2.2) means that the Y populations corresponding to various X values have 
the same variance. Put simply, the variation around the regression line (which is the line of 
average relationship between Y and X) is the same across the X values; it neither increases 
nor decreases as X varies. Diagrammatically, the situation is as depicted in Figure 3.4. 

In contrast, consider Figure 3.5, where the conditional variance of the Y population 
varies with X. This situation is known appropriately as heteroscedasticity, or unequal 
spread, or variance. Symbolically, in this situation, Eq. (3.2.2) can be written as 

var (u, | Xj) = a 2 i (3.2.3) 

Notice the subscript on a 2 in Equation (3.2.3), which indicates that the variance of the Y 
population is no longer constant. 


FIGURE 3.4 

Homoscedasticity. 


flu) 



FIGURE 3.5 

Heteroscedasticity. 


m 
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To make the difference between the two situations clear, let 7 represent weekly 
consumption expenditure and X weekly income. Figures 3.4 and 3.5 show that as income 
increases, the average consumption expenditure also increases. But in Figure 3.4 the 
variance of consumption expenditure remains the same at all levels of income, whereas in 
Figure 3.5 it increases with increase in income. In other words, richer families on the 
average consume more than poorer families, but there is also more variability in the 
consumption expenditure of the former. 

To understand the rationale behind this assumption, refer to Figure 3.5. As this figure 
shows, \ r 'dr(u\X\) < var(u| X 2 ),. . ., < v'dr(u\X,). Therefore, the likelihood is that the Y ob¬ 
servations coming from the population with X = X\ would be closer to the PRF than those 
coming from populations corresponding to X = X 2 , X = X 3 , and so on. In short, not all Y 
values corresponding to the various Xs will be equally reliable, reliability being judged by 
how closely or distantly the 7 values are distributed around their means, that is, the points 
on the PRF. If this is in fact the case, would we not prefer to sample from those 7 popula¬ 
tions that are closer to their mean than those that are widely spread? But doing so might re¬ 
strict the variation we obtain across X values. 

By invoking Assumption 4, we are saying that at this stage, all 7 values corresponding 
to the various Xs are equally important. In Chapter 11 we shall see what happens if this is 
not the case, that is, where there is heteroscedasticity. 

In passing, note that Assumption 4 implies that the conditional variances of 7, are also 
homoscedastic. That is, 

var(7 I |A i ) = o 2 (3.2.4) 

Of course, the unconditional variance of 7 is of. Later we will see the importance of 
distinguishing between conditional and unconditional variances of 7 (see Appendix A for 
details of conditional and unconditional variances). 


ASSUMPTION 5 No Autocorrelation between the Disturbances: Given any two X values, X, and 
Xj(i + j), the correlation between any two u, and u,(J + j ) is zero. In short, the observations 
are sampled independently. Symbolically, 

cov(u„ Uj | X/, Xj) = 0 (3.2.5) 

cov(i j,, uj) = 0, if X is nonstochastic 

where / and j are two different observations and where cov means covariance. 


In words, Equation 3.2.5 postulates that the disturbances u, and uj are uncorrelated. 
Technically, this is the assumption of no serial correlation, or no autocorrelation. This 
means that, given X t , the deviations of any two 7 values from their mean value do not 
exhibit patterns such as those shown in Figures 3.6(a) and ( b ). In Figure 3.6(a), we see that 
the us are positively correlated, a positive u followed by a positive u or a negative u 
followed by a negative u. In Figure 3.6(6), the us are negatively correlated, a positive u 
followed by a negative u and vice versa. 

If the disturbances (deviations) follow systematic patterns, such as those shown in Fig¬ 
ures 3.6(a) and (6), there is auto- or serial correlation, and what Assumption 5 requires is 
that such correlations be absent. Figure 3.6(c) shows that there is no systematic pattern to 
the u’s, thus indicating zero correlation. 
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FIGURE 3.6 

Patterns of correlation 
among the 
disturbances. 

(a) positive serial 
correlation; 

(b) negative serial 
correlation; (c) zero 
correlation. 


(a) 


(fc) 



(c) 


The full import of this assumption will be explained thoroughly in Chapter 12. But 
intuitively one can explain this assumption as follows. Suppose in our PRF ( Y, = f5\ + p 2 X t + 
u t ) that u, and u t -\ are positively correlated. Then Y, depends not only on X, but also on u t -\, 
for u t -1 to some extent determines u t . At this stage of the development of the subject mat¬ 
ter, by invoking Assumption 5, we are saying that we will consider the systematic effect, if 
any, of X, on Y, and not worry about the other influences that might act on 7 as a result of 
the possible intercorrelations among the m’s. But, as noted in Chapter 12, we will see how 
intercorrelations among the disturbances can be brought into the analysis and with what 
consequences. 

But it should be added here that the justification of this assumption depends on the type 
of data used in the analysis. If the data are cross-sectional and are obtained as a random 
sample from the relevant population, this assumption can often be justified. However, if the 
data are time series, the assumption of independence is difficult to maintain, for successive 
observations of a time series, such as GDP, are highly correlated. But we will deal with this 
situation when we discuss time series econometrics later in the text. 


ASSUMPTION 6 The Number of Observations n Must Be Greater than the Number of 
Parameters to Be Estimated: Alternatively, the number of observations must be 
greater than the number of explanatory variables. 
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This assumption is not so innocuous as it seems. In the hypothetical example of 
Table 3.1, imagine that we had only the first pair of observations on 7 and X (4 and 1). From 
this single observation there is no way to estimate the two unknowns, fi\ and fc- We need 
at least two pairs of observations to estimate the two unknowns. In a later chapter we will 
see the critical importance of this assumption. 


ASSUMPTION 7 The Nature of X Variables: The X values in a given sample must not all be the same. 

Technically, var (X) must be a positive number. Furthermore, there can be no outliers in 
the values of the X variable, that is, values that are very large in relation to the rest of the 
observations. 


The assumption that there is variability in the X values is also not as innocuous as it 
looks. Look at Eq. (3.1.6). If all the X values are identical, then X l — X (Why?) and the 
denominator of that equation will be zero, making it impossible to estimate (L and 
therefore fi\. Intuitively, we readily see why this assumption is important. Looking at our 
family consumption expenditure example in Chapter 2, if there is very little variation in 
family income, we will not be able to explain much of the variation in the consumption 
expenditure. The reader should keep in mind that variation in both Y and X is essential to 
use regression analysis as a research tool. In short, the variables must vary! 

The requirement that there are no outliers in the X values is to avoid the regression results 
being dominated by such outliers. If there are a few lvalues that are, say, 20 times the average 
of the X values, the estimated regression lines with or without such observations might be 
vastly different. Very often such outliers are the result of human errors of arithmetic or mix¬ 
ing samples from different populations. In Chapter 13 we will discuss this topic further. 

Our discussion of the assumptions underlying the classical linear regression model is 
now complete. It is important to note that all of these assumptions pertain to the PRF only 
and not the SRF. But it is interesting to observe that the method of least squares discussed 
previously has some properties that are similar to the assumptions _we have made about 
the PRF. For example, the finding that ^ m, =0 and, therefore, u — 0, is akin to the 
assumption that E(u, \ X,) — 0. Likewise, the finding that ^ = 0 is similar to the 

assumption that cov(m„ 2Q = 0. It is comforting to note that the method of least squares 
thus tries to “duplicate” some of the assumptions we have imposed on the PRF. 

Of course, the SRF does not duplicate all the assumptions of the CLRM. As we will 
show later, although cov(w„ uj) = 0 (i + j) by assumption, it is not true that the sample 
cov(m„ Uj) = 0 (i # j). As a matter of fact, we will show later that the residuals are not only 
autocorrelated but are also heteroscedastic (see Chapter 12). 

A Word about These Assumptions 

The million-dollar question is: How realistic are all these assumptions? The “reality of 
assumptions” is an age-old question in the philosophy of science. Some argue that it does 
not matter whether the assumptions are realistic. What matters are the predictions based 
on those assumptions. Notable among the “irrelevance-of-assumptions thesis” is Milton 
Friedman. To him, unreality of assumptions is a positive advantage: “to be important... a 
hypothesis must be descriptively false in its assumptions.” 13 

One may not subscribe to this viewpoint fully, but recall that in any scientific study we 
make certain assumptions because they facilitate the development of the subject matter in 
gradual steps, not because they are necessarily realistic in the sense that they replicate 

13 Milton Friedman, Essays in Positive Economics, University of Chicago Press, Chicago, 1953, p. 14. 
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reality exactly. As one author notes, .. if simplicity is a desirable criterion of good theory, 
all good theories idealize and oversimplify outrageously.” 14 

What we plan to do is first study the properties of the CLRM thoroughly, and then in 
later chapters examine in depth what happens if one or more of the assumptions of CLRM 
are not fulfilled. At the end of this chapter, we provide in Table 3.4 a guide to where one can 
find out what happens to the CLRM if a particular assumption is not satisfied. 

As a colleague pointed out to us, when we review research done by others, we need to 
consider whether the assumptions made by the researcher are appropriate to the data and 
problem. All too often, published research is based on implicit assumptions about the prob¬ 
lem and data that are likely not correct and that produce estimates based on these assump¬ 
tions. Clearly, the knowledgeable reader should, realizing these problems, adopt a skeptical 
attitude toward the research. The assumptions listed in Table 3.4 therefore provide a check¬ 
list for guiding our research and for evaluating the research of others. 

With this backdrop, we are now ready to study the CLRM. In particular, we want to find 
out the statistical properties of OLS compared with the purely numerical properties 
discussed earlier. The statistical properties of OLS are based on the assumptions of CLRM 
already discussed and are enshrined in the famous Gauss-Markov theorem. But before we 
turn to this theorem, which provides the theoretical justification for the popularity of OLS, 
we first need to consider the precision or standard errors of the least-squares estimates. 

3.3 Precision or Standard Errors of Least-Squares Estimates 

From Eqs. (3.1.6) and (3.1.7), it is evident that least-squares estimates are a function of the 
sample data. But since the data are likely to change from sample to sample, the estimates 
will change ipso facto. Therefore, what is needed is some measure of “reliability” or 
precision of the estimators and /fe- In statistics the precision of an estimate is measured 
by its standard error (se). 15 Given the Gaussian assumptions, it is shown in Appendix 3A, 
Section 3A.3 that the standard errors of the OLS estimates can be obtained as follows: 


var(/§ 2 ) Ig^H 

(3.3.1) 

se(/3 2 )= r ~— 1 

(3.3.2) 

varl/ii). -LLjo 2 

(3.3.3) 

se(j8i) = ,/^V 

(3.3.4) 


14 Mark Blaug, The Methodology of Economics: Or How Economists Explain, 2d ed., Cambridge 
University Press, New York, 1992, p. 92. 

15 The standard error is nothing but the standard deviation of the sampling distribution of the esti¬ 
mator, and the sampling distribution of an estimator is simply a probability or frequency distribution 
of the estimator, that is, a distribution of the set of values of the estimator obtained from all possible 
samples of the same size from a given population. Sampling distributions are used to draw inferences 
about the values of the population parameters on the basis of the values of the estimators calculated 
from one or more samples. (For details, see Appendix A.) 
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where var = variance and se = standard error and where cr 2 is the constant or 
homoscedastic variance of u t of Assumption 4. 

All the quantities entering into the preceding equations except cr 2 can be estimated from 
the data. As shown in Appendix 3A, Section 3A.5, a 2 itself is estimated by the following 
formula: 


M 

' n-2 


(3.3.5) 


where a 2 is the OLS estimator of the true but unknown a 2 and where the expression n — 2 
is known as the number of degrees of freedom (df), Y w 2 being the sum of the residuals 
squared or the residual sum of squares (RSS). 16 

Once Y u] is known, <r 2 can be easily computed. Y itself can be computed either 
from Eq. (3.1.2) or from the following expression (see Section 3.5 for the proof): 


£*? = £>?“££*? (3-3.6) 


Compared with Eq. (3.1.2), Equation 3.3.6 is easy to use, for it does not require computing 
iti for each observation although such a computation will be useful in its own right (as we 
shall see in Chapters 11 and 12). 

Since 



an alternative expression for computing Y is 


£*? = £*- 


m 


In passing, note that the positive square root of a 2 



(3.3.7) 


(3.3.8) 


is known as the standard error of estimate or the standard error of the regression (se). 
It is simply the standard deviation of the Y values about the estimated regression line and is 
often used as a summary measure of the “goodness of fit” of the estimated regression line, 
a topic discussed in Section 3.5. 

Earlier we noted that, given X t , a 2 represents the (conditional) variance of both w, and 
Yj. Therefore, the standard error of the estimate can also be called the (conditional) 
standard deviation of u, and Y, . Of course, as usual, cry and cry represent, respectively, the 
unconditional variance and unconditional standard deviation of Y. 


16 The term number of degrees of freedom means the total number of observations in the sample 
(= n ) less the number of independent (linear) constraints or restrictions put on them. In other words, 
it is the number of independent observations out of a total of n observations. For example, before the 
RSS (3.1.2) can be computed, 4i and fa must first be obtained. These two estimates therefore put two 
restrictions on the RSS. Therefore, there are n — 2, not n, independent observations to compute the 
RSS. Following this logic, in the three-variable regression RSS will have n — 3 df, and for the /f-variable 
model it will have n — kdi. The general rule is this: df = (n- number of parameters estimated). 
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Note the following features of the variances (and therefore the standard errors) of ft 
and $2. 

1. The variance of ft is directly proportional to a 2 but inversely proportional to J2 x f- 
That is, given a 2 , the larger the variation in the X values, the smaller the variance of ft and 
hence the greater the precision with which ft can be estimated. In short, given a 2 , if there is 
substantial variation in the X values, ft can be measured more accurately than when the X, do 
not vary substantially. Also, given ^ x 2 , the larger the variance of a 2 , the larger the variance 
of fa -Note that as the sample size n increases, the number of terms in the sum, xf, will in¬ 
crease. As n increases, the precision with which ft can be estimated also increases. (Why?) 

2. The variance of ft is directly proportional to cr 2 and J2 X 2 but inversely proportional 
to J2 x 2 and the sample size n. 

3. Since ft and @2 are estimators, they will not only vary from sample to sample but in 
a given sample they are likely to be dependent on each other, this dependence being mea¬ 
sured by the covariance between them. It is shown in Appendix 3A, Section 3 A.4 that 


cov(ft, ft) = -Avar (ft) 



Since var(ft) is always positive, as is the variance of any variable, the nature of the 
covariance between ft and ft depends on the sign of X. If X is positive, then as the 
formula shows, the covariance will be negative. Thus, if the slope coefficient ft is overes¬ 
timated (i.e., the slope is too steep), the intercept coefficient ft will be underestimated (i.e., 
the intercept will be too small). Later on (especially in the chapter on multicollinearity, 
Chapter 10), we will see the utility of studying the covariances between the estimated 
regression coefficients. 

How do the variances and standard errors of the estimated regression coefficients 
enable one to judge the reliability of these estimates? This is a problem in statistical 
inference, and it will be pursued in Chapters 4 and 5. 

3.4 Properties of Least-Squares Estimators: The Gauss-Markov 
Theorem 17 


As noted earlier, given the assumptions of the classical linear regression model, the least- 
squares estimates possess some ideal or optimum properties. These properties are con¬ 
tained in the well-known Gauss-Markov theorem. To understand this theorem, we need 
to consider the best linear unbiasedness property of an estimator. 18 As explained in 
Appendix A, an estimator, say the OLS estimator ft, is said to be a best linear unbiased 
estimator (BLUE) of ft if the following hold: 

1 . It is linear, that is, a linear function of a random variable, such as the dependent variable 
Y in the regression model. 

1 Although known as the Gauss-Markov theorem, the least-squares approach of Gauss antedates 
(1821) the minimum-variance approach of Markov (1900). 

18 The reader should refer to Appendix A for the importance of linear estimators as well as for a 
general discussion of the desirable properties of statistical estimators. 
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2. It is unbiased, that is, its average or expected value, E(j3 2 ), is equal to the true value, fi 2 . 

3. It has minimum variance in the class of all such linear unbiased estimators; an unbiased 
estimator with the least variance is known as an efficient estimator. 

In the regression context it can be proved that the OLS estimators are BLUE. This is the 
gist of the famous Gauss-Markov theorem, which can be stated as follows: 


Gauss—Markov Given the assumptions of the classical linear regression model, the least-squares 

Theorem estimators, in the class of unbiased linear estimators, have minimum variance, that is, they 

are BLUE. 


The proof of this theorem is sketched in Appendix 3A, Section 3A.6. The full import of 
the Gauss-Markov theorem will become clearer as we move along. It is sufficient to note 
here that the theorem has theoretical as well as practical importance. 19 
What all this means can be explained with the aid of Figure 3.7. 


FIGURE 3.7 

Sampling distribution 
of OLS estimator ft 
and alternative 
estimator p 2 . 



E0 2 )=p 2 

(a) Sampling distribution of ft 


k 



( b ) Sampling distribution of ft 



(c) Sampling distributions of ft and ft 


19 For example, it can be proved that any linear combination of thefts, such as (ft — 2ft), can be esti¬ 
mated by (ft — 2ft), and this estimator is BLUE. For details, see Henri Theil, Introduction to Econometrics, 
Prentice-Hall, Englewood Cliffs, N.J., 1978, pp. 401 -402. Note a technical point about the Gauss-Markov 
theorem: It provides only the sufficient (but not necessary) condition for OLS to be efficient. I am 
indebted to Michael McAleer of the University of Western Australia for bringing this point to my attention. 
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In Figure 3.7(a) we have shown the sampling distribution of the OLS estimator 02, that is, 
the distribution of the values taken by 02 in repeated sampling experiments (recall Table 3.1). 
For convenience we have assumed 02 to be distributed symmetrically (but more on this in 
Chapter 4). As the figure shows, the mean of the 02 values, E(0 2), is equal to the true P2 . In this 
situation we say that 02 is an unbiased estimator of @2- In Figure 3.7(h) we have shown the 
sampling distribution of /if, an alternative estimator of fn obtained by using another (i.e., other 
than OLS) method. For convenience, assume that /if, like 02, is unbiased, that is, its average 
or expected value is equal to fa ■ Assume further that both 02 and /3f are linear estimators, that 
is, they are linear functions of Y. Which estimator, 02 or /if, would you choose? 

To answer this question, superimpose the two figures, as in Figure 3.7(c). It is obvious 
that although both 02 and /if are unbiased the distribution of /if is more diffused or wide¬ 
spread around the mean value than the distribution of 02- In other words, the variance of /if 
is larger than the variance of 02- Now given two estimators that are both linear and unbiased, 
one would choose the estimator with the smaller variance because it is more likely to be 
close to p 2 than the alternative estimator. In short, one would choose the BLUE estimator. 

The Gauss-Markov theorem is remarkable in that it makes no assumptions about the 
probability distribution of the random variable u ,, and therefore of 1) (in the next chapter we 
will take this up). As long as the assumptions of CLRM are satisfied, the theorem holds. As 
a result, we need not look for another linear unbiased estimator, for we will not find such an 
estimator whose variance is smaller than the OLS estimator. Of course, if one or more of 
these assumptions do not hold, the theorem is invalid. For example, if we consider nonlinear- 
in-the-parameter regression models (which are discussed in Chapter 14), we may be able to 
obtain estimators that may perform better than the OLS estimators. Also, as we will show in 
the chapter on heteroscedasticity, if the assumption of homoscedastic variance is not 
fulfilled, the OLS estimators, although unbiased and consistent, are no longer minimum 
variance estimators even in the class of linear estimators. 

The statistical properties that we have just discussed are known as finite sample 
properties: These properties hold regardless of the sample size on which the estimators are 
based. Later we will have occasions to consider the asymptotic properties, that is, proper¬ 
ties that hold only if the sample size is very large (technically, infinite). A general discus¬ 
sion of finite-sample and large-sample properties of estimators is given in Appendix A. 

3.5 The Coefficient of Determination r 2 : A Measure of 
“Goodness of Fit” 


Thus far we were concerned with the problem of estimating regression coefficients, their stan¬ 
dard errors, and some of their properties. We now consider the goodness of fit of the fitted 
regression line to a set of data; that is, we shall find out how “well” the sample regression line 
fits the data. From Figure 3.1 it is clear that if all the observations were to lie on the regression 
line, we would obtain a “perfect” fit, but this is rarely the case. Generally, there will be some 
positive u , and some negative u ,. What we hope for is that these residuals around the regression 
line are as small as possible. The coefficient of determination r 2 (two-variable case) or R 2 
(multiple regression) is a summary measure that tells how well the sample regression line fits 
the data. 

Before we show how r 2 is computed, let us consider a heuristic explanation of r 2 in 
terms of a graphical device, known as the Venn diagram, or the Ballentine, as shown 
in Figure 3.8. 20 

20 See Peter Kennedy, "Ballentine: A Graphical Aid for Econometrics," Australian Economics Papers, 
vol. 20, 1981, pp. 414^116. The name Ballentine is derived from the emblem of the well-known 
Ballantine beer with its circles. 
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FIGURE 3.8 

The Ballentine view 
of r 2 -. (a) r 2 = 0; 

(/V 2 =1. 



(a) 



GO 



(c) 



(d) 


(e) (f) 


In this figure the circle 7 represents variation in the dependent variable Y and the circle X 
represents variation in the explanatory variable X. 21 The overlap of the two circles (the 
shaded area) indicates the extent to which the variation in Y is explained by the variation in 
X (say, via an OLS regression). The greater the extent of the overlap, the greater the variation 
in Tis explained by X. The r 2 is simply a numerical measure of this overlap. In the figure, as 
we move from left to right, the area of the overlap increases, that is, successively a greater 
proportion of the variation in Y is explained by X. In short, r 2 increases. When there is no 
overlap, r 2 is obviously zero, but when the overlap is complete, r 2 is 1, since 100 percent of 
the variation in Y is explained by X. As we shall show shortly, r 2 lies between 0 and 1. 

To compute this r 2 , we proceed as follows: Recall that 

Y i = Y i +u i (2.6.3) 


or in the deviation form 


yi=yi + ui (3.5.1) 

where use is made ofEqs. (3.1.13) and (3.1.14). Squaring Equation 3.5.1 on both sides and 
summing over the sample, we obtain 

Etf-Etf+E^E*** 

= E-? 2 + E“ 2 (3.5.2) 

il*E*?+E<e 


since = 0 (why?) and j), = fai . 

The various sums of squares appearing in Equation 3.5.2 can be described as follows: 
J2yf = J2( Y i ~ Y ) 2 — total variation of the actual Y values about their sample mean, 
which may be called the total sum of squares (TSS). = J2( Y i -i Y ) 2 _= 

^2(Y, — Y) 2 = PlYl x i = variation of the estimated Y values about their mean (7=7), 
which appropriately may be called the sum of squares due to regression [i.e., due to the ex¬ 
planatory variable(s)], or explained by regression, or simply the explained sum of squares 


21 The term variation and variance are different. Variation means the sum of squares of the deviations 
of a variable from its mean value. Variance is this sum of squares divided by the appropriate degrees 
of freedom. In short, variance = variation/df. 
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FIGURE 3.9 

Breakdown of the 
variation of Y t into two 
components. 



(ESS). E w? = residual or unexplained variation of the Y values about the regression line, 
or simply the residual sum of squares (RSS). Thus, Eq. (3.5.2) is 

TSS = ESS + RSS (3.5.3) 


and shows that the total variation in the observed Y values about their mean value can be 
partitioned into two parts, one attributable to the regression line and the other to random 
forces because not all actual Y observations lie on the fitted line. Geometrically, we have 
Figure 3.9. 

Now dividing Equation 3.5.3 by TSS on both sides, we obtain 


_ ESS RSS 
_ TSS + TSS 

. gjyp ± 

E(Yi - T) 2 E(7i - Y) 2 

We now define r 2 as 


(3.5.4) 


, jBttHf) 2 _ ESS 

r IXYi- Y ) 2 TSS 


(3.5.5) 


or, alternatively, as 


Sy - Y ) 2 

1 _ RSS 
~ TSS 


(3.5.5a) 


The quantity r 2 thus defined is known as the (sample) coefficient of determination and is 
the most commonly used measure of the goodness of fit of a regression line. Verbally, r 2 
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measures the proportion or percentage of the total variation in Y explained by the regres¬ 
sion model. 

Two properties of r 2 may be noted: 

1. It is a nonnegative quantity. (Why?) 

2. Its limits are 0 < r 2 < 1. An r 2 of 1 means a perfect fit, that is, Y t — Y t for each i. On 
the other hand, an r 2 of zero means that there is no relationship between the regressand and 
the regressor whatsoever (i.e., ft = 0). In this case, as Eq. (3.1.9) shows, Y t = ft = Y, 
that is, the best prediction of any Y value is simply its mean value. In this situation there¬ 
fore the regression line will be horizontal to the X axis. 

Although r 2 can be computed directly from its definition given in Equation 3.5.5, it can 
he obtained more quickly from the following formula: 


ESS 

TSS 



(3.5.6) 


If we divide the numerator and the denominator of Equation 3.5.6 by the sample size n (or 
n — 1 if the sample size is small), we obtain 



(3.5.7) 


where .S' 2 and S 2 are the sample variances of Y and X, respectively. 
Since ft = X X !'T; /XX 2 ’ Eq. (3.5.6) can also be expressed as 


,2 j SHF 

E x ?Ej# 


(3.5.8) 



RSS = TSS - ESS 


= TSS(1 — ESS/TSS) 


(3.5.10) 



Therefore, we can write 


TSS = ESS + RSS 




(3.5.11) 


an expression that we will find very useful later. 
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A quantity closely related to but conceptually very much different from r 2 is the 
coefficient of correlation, which, as noted in Chapter 1, is a measure of the degree of 
association between two variables. It can be computed either from 

r = ±V^ (3.5.12) 

or from its definition 


nE^tWEVXEW 

y[»E*ME*/) 2 ]["Et 2 -(E«) 2 ] 

which is known as the sample correlation coefficient. 22 
Some of the properties of r are as follows (see Figure 3.10): 


(3.5.13) 


1. It can be positive or negative, the sign depending on the sign of the term in the 
numerator of Equation 3.5.13, which measures the sample covariation of two variables. 

2. It lies between the limits of — 1 and +1; that is, — 1 < r < 1. 

3. It is symmetrical in nature; that is, the coefficient of correlation between X and 
Y(r X Y) is the same as that between Y and X(r Y x)- 

4. It is independent of the origin and scale; that is, if we define X* — aX t + C and 
Y* = bYj + d, where a > 0, b > 0, and c and d are constants, then r between X* and Y* 
is the same as that between the original variables X and Y. 

5. If A and Y are statistically independent (see Appendix A for the definition), the 
correlation coefficient between them is zero; but if r = 0, it does not mean that two 
variables are independent. In other words, zero correlation does not necessarily imply 
independence. [See Figure 3.10(/z).] 

6. It is a measure of linear association or linear dependence only; it has no meaning for 
describing nonlinear relations. Thus in Figure 3.10(/z), Y = X 2 is an exact relationship yet 
r is zero. (Why?) 

7. Although it is a measure of linear association between two variables, it does not 
necessarily imply any cause-and-effect relationship, as noted in Chapter 1. 

In the regression context, r 2 is a more meaningful measure than r, for the former tells us 
the proportion of variation in the dependent variable explained by the explanatory vari¬ 
able^) and therefore provides an overall measure of the extent to which the variation in one 
variable determines the variation in the other. The latter does not have such value. 23 More¬ 
over, as we shall see, the interpretation of r (= R) in a multiple regression model is of 
dubious value. However, we will have more to say about r 2 in Chapter 7. 

In passing, note that the r 2 defined previously can also be computed as the squared 
coefficient of correlation between actual Y, and the estimated Y t , namely, Y,. That is, using 
Eq. (3.5.13), we can write 


r 2_ [E(y- -^x^-y)] 2 

W - Y ) 2 ^-?) 2 


^The population correlation coefficient, denoted by p, is defined in Appendix A. 

23 ln regression modeling the underlying theory will indicate the direction of causality between Y and 
X, which, in the context of single-equation models, is generally from X to Y. 
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FIGURE 3.10 

Correlation patterns 
(adapted from Henri 
Theil, Introduction to 
Econometrics, 
Prentice-Hall, 
Englewood Cliffs, NJ, 
1978, p. 86). 
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(3.5.14) 


where Y t = actual Y, Y t = estimated Y, and Y — Y — the mean of Y. For proof, see 
Exercise 3.15. Expression 3.5.14 justifies the description ofr 2 as a measure of goodness of 
fit, for it tells how close the estimated Y values are to their actual values. 


3.6 A Numerical Example 

We illustrate the econometric theory developed so far by considering the data given in 
Table 2.6, which relates mean hourly wage (T) and years of schooling (X). Basic labor 
economics theory tells us, that among many variables, education is an important determi¬ 
nant of wages. 

In Table 3.2 we provide the necessary raw data to estimate the quantitative impact of 
education on wages. 
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TABLE 3.2 
Raw Data Based 
on Table 2.6 


Obs Y X 


y */ YiXi 


2 

3 

4 

5 

6 

7 

8 
9 

10 

11 

12 

13 


4.4567 6 

5.77 7 

5.9787 8 

7.3317 9 

7.3182 10 

6.5844 11 

7.8182 12 

7.8351 13 

11.0223 14 

10.6738 15 

10.8361 16 

13.615 17 

13.531 18 


-6 -4.218 

-5 -2.9047 

-4 -2.696 

-3 -1.343 

-2 -1.3565 

-1 -2.0903 

0 -0.8565 

1 -0.8396 

2 2.3476 

3 1.9991 

4 2.1614 

5 4.9403 

6 4.8563 


36 25.308 

25 14.5235 

16 10.784 

9 4.029 

4 2.713 

1 2.0903 

0 0 

1 -0.8396 

4 4.6952 

9 5.9973 

16 8.6456 

25 24.7015 

36 29.1378 


Sum 112.7712 156 0 


0 


182 131.7856 


Obs 

is 

a 


| u,= Y,-Y | 

■ 

1 

36 

19.8621 7 

4.165294 

0.291406 

0.084917 

2 

49 

33.2929 

4.916863 

0.853137 

0.727843 

3 

64 

35.74485 

5.668432 

0.310268 

0.096266 

4 

81 

53.75382 

6.420001 

0.911699 

0.831195 

5 

100 

53.55605 

7.17157 

0.14663 

0.0215 

6 

121 

43.35432 

7.923139 

-1.33874 

1.792222 

7 

144 

61.12425 

8.674708 

-0.85651 

0.733606 

8 

169 

61.38879 

9.426277 

-1.59118 

2.531844 

9 

196 

121.4911 

10.17785 

0.844454 

0.713103 

10 

225 

113.93 

10.92941 

-0.25562 

0.065339 

11 

256 

117.4211 

11.68098 

-0.84488 

0.713829 

12 

289 

185.3682 

12.43255 

1.182447 

1.398181 

13 

324 

183.088 

13.18412 

0.346878 

0.120324 

Sum 

2054 

1083.376 

112.7712 

=0 

9.83017 


Xi = X, - X ; V,' = Yi = Y 

j} l = Y-i32X= 8.674708 - 0.7240967x12 = -0.01445 


Em? 9.83017 


_ 0.893652 
” 182.0 


- = 0.893652; a = 0.945332 

= 0.004910; se(ft) = V0.00490 = 0.070072 
9.83017 „ _ 


se(jSi) = V0.868132 = 0.9317359 
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FIGURE 3.11 

Estimated regression 
line for wage-education 
data from Table 2.6. 



Education 


From the data given in this table, we obtain the estimated regression line as follows: 

% = -0.0144 + 0.7240X, (3.6.1) 

Geometrically, the estimated regression line is as shown in Figure 3.11. 

As we know, each point on the regression line gives an estimate of the mean value of Y 
corresponding to the chosen X value, that is, % is an estimate of E(Y\X l ). The value of = 
0.7240, which measures the slope of the line, shows that, within the sample range of X 
between 6 and 18 years of education, as X increases by 1, the estimated increase in mean 
hourly wages is about 72 cents. That is, each additional year of schooling, on average, 
increases hourly wages by about 72 cents. 

The value of fi\ = —0.0144, which is the intercept of the line, indicates the average 
level of wages when the level of education is zero. Such literal interpretation of the inter¬ 
cept in the present case does not make any sense. How could there be negative wages? As 
we will see throughout this book, very often the intercept term has no viable practical 
meaning. Besides, zero level of education is not in the observed level of education in our 
sample. As we will see in Chapter 5, the observed value of the intercept is not statistically 
different from zero. 

The r 2 value of about 0.90 suggests that education explains about 90 percent of the vari¬ 
ation in hourly wage. Considering that r 2 can be at most 1, our regression line fits the data 
very well. The coefficient of correlation, r — 0.9521, shows that wages and education are 
highly positively correlated. 

Before we leave our example, note that our model is extremely simple. Labor econom¬ 
ics theory tells us that, besides education, variables such as gender, race, location, labor 
unions, and language are also important factors in the determination of hourly wages. After 
we study multiple regression in Chapters 7 and 8, we will consider a more extended model 
of wage determination. 
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3.7 Illustrative Examples 


EXAMPLE 3.1 

Consumption- 

Income 

Relationship in 
the United States, 
1960-2005 


Let us revisit the consumption income data given in Table 1.1 of the Introduction. We have 
already shown the data in Figure 1.3, along with the estimated regression line in Eq. (1.3.3). 
Now we provide the underlying OLS regression results, which were obtained from EViews 6. 
Note Y = personal consumption expenditure (PCE) and X = gross domestic product (GDP), 
both measured in 2000 billions of dollars. In this example the data are time series data. 

Y t = -299.591 3 + 0.7218X t ( 3 . 7 . 1 ) 

var 060 = 827.4195 se(ft) = 28.7649 

var (ft) = 0.0000195 se (ft) = 0.004423 
r 2 = 0.9983 <r 2 = 73.56689 


Equation 3.7.1 is the aggregate, or economywide, Keynesian consumption function. 
As this equation shows, the marginal propensity to consume (MPC) is about 0.72, 
suggesting that if (real income) goes up by a dollar, the average personal consumption 
expenditure goes up by about 72 cents. According to Keynesian theory, MPC is expected 
to lie between 0 and 1. 

The intercept value in this example is negative, which has no viable economic 
interpretation. Literally interpreted, it means that if the value of GDP were zero, the 
average level of personal consumption expenditure would be a negative value of about 
299 billion dollars. 

The r 2 value of 0.9983 means approximately 99 percent of the variation in personal con¬ 
sumption expenditure is explained by variation in the GDP. This value is quite high, consid¬ 
ering that r 2 can at most be 1. As we will see throughout this book, in regressions involving 
time series data one generally obtains high r 2 values. We will explore the reasons behind 
this in the chapter on autocorrelation and also in the chapter on time series econometrics. 


EXAMPLE 3.2 

Food 

Expenditure in 
India 


Refer to the data given in Table 2.8 of Exercise 2.15. The data relate to a sample of 55 rural 
households in India. The regressand in this example is expenditure on food and the 
regressor is total expenditure, a proxy for income, both figures in rupees. The data in this 
example are thus cross-sectional data. 

On the basis of the given data, we obtained the following regression: 

FoodExp, = 94.2087 + 0.4368 TotalExp, ( 3 . 7 . 2 ) 

var (ft) = 2560.9401 se (ft) = 50.8563 

var (ft) = 0.0061 se (ft) = 0.0783 

r 2 = 0.3698 a 2 = 4469.6913 


From Equation 3.7.2 we see that if total expenditure increases by 1 rupee, on average, 
expenditure on food goes up by about 44 paise (1 rupee = 100 paise). If total expendi¬ 
ture were zero, the average expenditure on food would be about 94 rupees. Again, such 
a mechanical interpretation of the intercept may not be meaningful. However, in this 
example one could argue that even if total expenditure is zero (e.g., because of loss of a 
job), people may still maintain some minimum level of food expenditure by borrowing 
money or by dissaving. 

The r 2 value of about 0.37 means that only 37 percent of the variation in food expen¬ 
diture is explained by the total expenditure. This might seem a rather low value, but as we 
will see throughout this text, in cross-sectional data, typically one obtains low r 2 values, 
possibly because of the diversity of the units in the sample. We will discuss this topic 
further in the chapter on heteroscedasticity (see Chapter 11). 
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EXAMPLE 3.3 

Demand for 
Cellular Phones 
and Personal 
Computers in 
Relation to Per 
Capita Personal 
Income 


TABLE 3.3 

Number of Cellular 
Phone Subscribers 
per Hundred 
Persons and 
Number of Personal 
Computers per 100 
Persons and Per 
Capita Income in 
Selected Countries 
for 2003 

Source: Statistical Abstract 
of the United States, 2006, 
Table 1364 for data on cell 
phones and computers and 
Table 1327 for purchasing- 
power adjusted per capita 


Table 3.3 gives data on the number of cell phone subscribers and the number of personal 
computers (PCs), both per 100 persons, and the purchasing-power adjusted per capita in¬ 
come in dollars for a sample of 34 countries. Thus we have cross-sectional data. These data 
are for the year 2003 and are obtained from the Statistical Abstract of the United states, 
2006. 

Although cell phones and personal computers are used extensively in the United 
States, that is not the case in many countries. To see if per capita income is a factor in the 
use of cell phones and PCs, we regressed each of these means of communication on per 
capita income using the sample of 34 countries. The results are as follows: 


Country 

Cellphone 

PCs 

Per Capita Income ($) 

Argentina 

17.76 

8.2 

11410 

Australia 

71.95 

60.18 

28780 

Belgium 

79.28 

31.81 

28920 

Brazil 

26.36 

7.48 

7510 

Bulgaria 

46.64 

5.19 

75.4 

Canada 

41.9 

48.7 

30040 

China 

21.48 

2.76 

4980 

Colombia 

14.13 

4.93 

6410 

Czech Republic 

96.46 

17.74 

15600 

Ecuador 

18.92 

3.24 

3940 

Egypt 

8.45 

2.91 

3940 

France 

69.59 

34.71 

27640 

Germany 

78.52 

48.47 

27610 

Greece 

90.23 

8.17 

19900 

Guatemala 

13.15 

1.44 

4090 

Hungary 

76.88 

10.84 

13840 

India 

2.47 

0.72 

2880 

Indonesia 

8.74 

1.19 

3210 

Italy 

101.76 

23.07 

26,830 

japan 

67.9 

38.22 

28450 

Mexico 

29.47 

8.3 

8980 

Netherlands 

76.76 

46.66 

28560 

Pakistan 

1.75 

0.42 

2040 

Poland 

45.09 

14.2 

11210 

Russia 

24.93 

8.87 

8950 

Saudia Arabia 

32.11 

13.67 

13230 

South Africa 

36.36 

7.26 

101 30 

Spain 

91.61 

19.6 

22150 

Sweden 

98.05 

62.13 

26710 

Switzerland 

84.34 

70.87 

32220 

Thailand 

39.42 

3.98 

7450 

U.K. 

91.17 

40.57 

27690 

U.S. 

54.58 

65.98 

37750 

Venezuela 

27.3 

6.09 

4750 
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EXAMPLE 3.3 Demand for Cell Phones. Letting Y = number of cell phone subscribers and X = 
( Continued ) purchasing-power-adjusted per capita income, we obtained the following regression. 

?i = 14.4773 + 0.0022X, ( 3 - 7 . 3 ) 

se(3i) = 6.1523; se(3 2 ) = 0.00032 
r 2 = 0.6023 

The slope coefficient suggests that if per capita income goes up by, say, $1,000, on 
average, the number of cell phone subscribers goes up by about 2.2 per 100 persons. 
The intercept value of about 14.47 suggests that even if the per capita income is zero, the 
average number of cell phone subscribers is about 14 per 100 subscribers. Again, this 
interpretation may not have much meaning, for in our sample we do not have any coun¬ 
try with zero per capita income. The r 2 value is moderately high. But notice that our 
sample includes a variety of countries with varying levels of income. In such a diverse 
sample we would not expect a very high r 2 value. 

After we study Chapter 5, we will show how the estimated standard errors reported 
in Equation 3.7.3 can be used to assess the statistical significance of the estimated 
coefficients. 

Demand for Personal Computers. Although the prices of personal computers have come 
down substantially over the years, PCs are still not ubiquitous. An important determinant 
of the demand for personal computers is personal income. Another determinant is price, 
but we do not have comparative data on PC prices for the countries in our sample. 

Letting Y denote the number of PCs and X the per capita income, we have the follow¬ 
ing "partial" demand for the PCs (partial because we do not have comparative price data 
or data on other variables that might affect the demand for the PCs). 

% = -6.5833 + 0.0018X, ( 3 - 7 . 4 ) 

se(^Si) — 2.7437; se(y§ 2 ) = 0.00014 

r 2 = 0.8290 

As these results suggest, per capita personal income has a positive relationship to the 
demand for PCs. After we study Chapter 5, you will see that, statistically, per capita 
personal income is an important determinant of the demand for PCs. The negative value 
of the intercept in the present instance has no practical significance. Despite the diversity 
of our sample, the estimated r 2 value is quite high. The interpretation of the slope coeffi¬ 
cient is that if per capita income increases by, say, $1,000, on average, the demand for 
personal computers goes up by about 2 units per 100 persons. 

Even though the use of personal computers is spreading quickly, there are many 
countries which still use main-frame computers. Therefore, the total usage of computers 
in those countries may be much higher than that indicated by the sale of PCs. 


3.8 A Note on Monte Carlo Experiments 

In this chapter we showed that under the assumptions of CLRM the least-squares estima¬ 
tors have certain desirable statistical features summarized in the BLUE property. In the 
appendix to this chapter we prove this property more formally. But in practice how does 
one know that the BLUE property holds? For example, how does one find out if the OLS 
estimators are unbiased? The answer is provided by the so-called Monte Carlo experi¬ 
ments, which are essentially computer simulation, or sampling, experiments. 

To introduce the basic ideas, consider our two-variable PRF: 


Y t = Pi+ foXi + «, 


( 3 . 8 . 1 ) 
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A Monte Carlo experiment proceeds as follows: 

1. Suppose the true values of the parameters are as follows: ft = 20 and ft — 0.6. 

2. You choose the sample size, say n =25. 

3. You fix the values of X for each observation. In all you will have 25 X values. 

4. Suppose you go to a random number table, choose 25 values, and call them u, (these 
days most statistical packages have built-in random number generators). 24 

5. Since you know ft, ft, X t , and n,-, using Equation 3.8.1 you obtain 25 Y, values. 

6. Now using the 25 Y t values thus generated, you regress these on the 25 X values 
chosen in step 3, obtaining ft and ft, the least-squares estimators. 

7. Suppose you repeat this experiment 99 times, each time using the same ft, ft, and 
X values. Of course, the w, values will vary from experiment to experiment. Therefore, in 
all you have 100 experiments, thus generating 100 values each of ft and ft. (In practice, 
many such experiments are conducted, sometimes 1000 to 2000.) _ 

8. You take the averages of these 100 estimates and call them ft and ft- 

9. If these average values are about the same as the true values of ft and ft assumed in 
step 1, this Monte Carlo experiment “establishes” that the least-squares estimators are 
indeed unbiased. Recall that under CLRM E(ft) = ft and E(ft) = ft. 

These steps characterize the general nature of the Monte Carlo experiments. Such experi¬ 
ments are often used to study the statistical properties of various methods of estimating 
population parameters. They are particularly useful to study the behavior of estimators in 
small, or finite, samples. These experiments are also an excellent means of driving home 
the concept of repeated sampling that is the basis of most of classical statistical inference, 
as we shall see in Chapter 5. We shall provide several examples of Monte Carlo experi¬ 
ments by way of exercises for classroom assignment. (See Exercise 3.27.) 


Summary and 
Conclusions 


The important topics and concepts developed in this chapter can be summarized as follows. 

1. The basic framework of regression analysis is the CLRM. 

2. The CLRM is based on a set of assumptions. 

3. Based on these assumptions, the least-squares estimators take on certain properties sum¬ 
marized in the Gauss-Markov theorem, which states that in the class of linear unbiased 
estimators, the least-squares estimators have minimum variance. In short, they are 
BLUE. 

4. The precision of OLS estimators is measured by their standard errors. In Chapters 4 
and 5 we shall see how the standard errors enable one to draw inferences on the popula¬ 
tion parameters, the fi coefficients. 

5. The overall goodness of fit of the regression model is measured by the coefficient of 
determination, r 2 . It tells what proportion of the variation in the dependent variable, 
or regressand, is explained by the explanatory variable, or regressor. This r * 1 2 3 4 5 lies between 
0 and 1; the closer it is to 1, the better is the fit. 


24 ln practice it is assumed that a, follows a certain probability distribution, say, normal, with certain 
parameters (e.g., the mean and variance). Once the values of the parameters are specified, one can 
easily generate the u, using statistical packages. 



Chapter 3 Two- Variable Regression Model: The Problem of Estimation 85 


6. A concept related to the coefficient of determination is the coefficient of correlation, r. 
It is a measure of linear association between two variables and it lies between —1 
and +1. 

7. The CLRM is a theoretical construct or abstraction because it is based on a set of 
assumptions that may be stringent or “unrealistic.” But such abstraction is often neces¬ 
sary in the initial stages of studying any field of knowledge. Once the CLRM is mastered, 
one can find out what happens if one or more of its assumptions are not satisfied. The first 
part of this book is devoted to studying the CLRM. The other parts of the book consider 
the refinements of the CLRM. Table 3.4 gives the road map ahead. 


TABLE 3.4 

What Happens If the 

Assumption 

Number 

Type of Violation 

Where to Study? 

Assumptions of 

CLRM Are Violated? 

1 

Nonlinearity in parameters 

Chapter 14 

2 

Stochastic regressor(s) 

Chapter 1 3 


3 

Nonzero mean of «,■ 

Introduction to Part II 


4 

Heteroscedasticity 

Chapter 11 


5 

Autocorrelated disturbances 

Chapter 12 


6 

Sample observations less 

Chapter 10 


7 

than the number of regressors 
Insufficient variability in regressors 

Chapter 10 


8 

Multicollinearity* 

Chapter 10 


9 

Specification bias* 

Chapters 13, 14 


1 0** 

Nonnormality of disturbances 

Chapter 1 3 


♦These assumptions wi 

11 be introduced in Chapter 7, when we discuss the multiple 

: regression model. 


**Note: The assumptio 

n that the disturbances u, are normally distributed is not a p 

art of the CLRM. But more on this in Chapter 4. 


EXERCISES Questions 

3.1. Given the assumptions in column 1 of the table, show that the assumptions in column 
2 are equivalent to them. 


Assumptions of the Classical Model 


0) 

(2) 

£(<41 x,) = 0 

£(L 1X,) = ft + ftX 

COV (Ui, uj) — 0 / 7^ /' 

cov (ft ft) = 0 / / y 

var (m | XI) - a 2 

var (Yi | X,) = <x 2 


3.2. Show that the estimates /ft = 1.572 and ft = 1.357 used in the first experiment of 
Table 3.1 are in fact the OLS estimators. 

3.3. According to Malinvaud (see footnote 11), the assumption that E(uj X,) =0 is quite 
important. To see this, consider the PRF: Y — ft + ft X, + u,. Now consider 
two situations: (i) ft = 0, ft = 1, and E(uj) — 0; and (ii) ft =1, ft = 0, and 
E(uj) = (X, — 1). Now take the expectation of the PRF conditional upon Ain the 
two preceding cases and see if you agree with Malinvaud about the significance of 
the assumption E(u, \ X t ) = 0. 
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3.4. Consider the sample regression 

Yi = Jt + hXi + Uj 

Imposing the restrictions (i) z), = 0 and (ii) w, A, = 0, obtain the estimators f}\ 
and p 2 and show that they are identical with the least-squares estimators given in 
Eqs. (3.1.6) and (3.1.7). This method of obtaining estimators is called the analogy 
principle. Give an intuitive justification for imposing restrictions (i) and (ii). 
(Hint: Recall the CLRM assumptions about u,.) In passing, note that the analogy prin¬ 
ciple of estimating unknown parameters is also known as the method of moments in 
which sample moments (e.g., sample mean) are used to estimate population moments 
(e.g., the population mean). As noted in Appendix A, a moment is a summary statis¬ 
tic of a probability distribution, such as the expected value and variance. 

3.5. Show that r 2 defined in (3.5.5) ranges between 0 and 1. You may use the 
Cauchy-Schwarz inequality, which states that for any random variables A and 7 the 
following relationship holds true: 

[E(XY)f < E(X 2 )E(J 2 ) 

3.6. Let Pyx and Pxy represent the slopes in the regression of Y on X and X on Y, 
respectively. Show that 

PyxPxy =r 2 

where r is the coefficient of correlation between X and Y. 

Suppose in Exercise 3.6 that PyxPxy = E Does it matter then if we regress Ton A 
or X on Yi Explain carefully. 

Spearman’s rank correlation coefficient r s is defined as follows: 

1 1 6 T.d 2 

s n(n 2 - 1) 

where d — difference in the ranks assigned to the same individual or phenomenon 
and n — number of individuals or phenomena ranked. Derive r s from r defined in 
Eq. (3.5.13). Hint: Rank the X and Tvalues from 1 to n. Note that the sum of X and 
Tranks is n(n + l)/2 each and therefore their means are (n + l)/2. 

3.9. Consider the following formulations of the two-variable PRF: 

Model I: Y t = p x + p 2 X t + Ui 
Model II: 7, = or, + a 2 (X, — X) + u, 

a. Find the estimators of P\ and a \. Are they identical? Are their variances identical? 

b. Find the estimators of p 2 and a 2 . Are they identical? Are their variances identical? 

c. What is the advantage, if any, of model II over model I? 

3.10. Suppose you run the following regression: 

* = h+ fat + Ui 

where, as usual, y, and x, are deviations from their respective mean values. 
What will be the value of P\ ? Why? Will P 2 be the same as that obtained from 
Eq. (3.1.6)? Why? 


3.7. 

3.8. 
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3.11. Let r\ = coefficient of correlation between n pairs of values (7 ; , A,) and r 2 — 
coefficient of correlation between n pairs of values (aX l + b, cY t + d ), where a, b, c, 
and d are constants. Show that r\ — r 2 and hence establish the principle that the coef¬ 
ficient of correlation is invariant with respect to the change of scale and the change of 
origin. 

Hint: Apply the definition of r given in Eq. (3.5.13). 

Note: The operations a A,, A,- + b, and a A, + b are known, respectively, as the 
change of scale, change of origin, and change of both scale and origin. 

3.12. If r, the coefficient of correlation between n pairs of values (A,, 7,), is positive, then 
determine whether each of the following statements is true or false: 

a. r between (—A,, —7,) is also positive. 

b. r between (—A,, 7j) and that between (A,, —Y t ) can be either positive or 
negative. 

c. Both the slope coefficients p yx and ft xy are positive, where fi yx = slope coefficient 
in the regression of Y on A and fi xy = slope coefficient in the regression of A on Y. 

3.13. If X\, X 2 , and A3 are uncorrelated variables each having the same standard devia¬ 
tion, show that the coefficient of correlation between X\ + X 2 and A2 + A3 is equal 
to \. Why is the correlation coefficient not zero? 

3.14. In the regression 7, — fi\ + fi 2 Xi + Ui suppose we multiply each Avalue by a con¬ 
stant, say, 2. Will it change the residuals and fitted values of Y1 Explain. What if we 
add a constant value, say, 2, to each A value? 

3.15. Show that Eq. (3.5.14) in fact measures the coefficient of determination. 

Hint: Apply the definition of r given in Eq. (3.5.13) and recall that = 

XX.& + Ui)f = Xa?> and remember Eq. (3.5.6). 

3.16. Explain with reason whether the following statements are true, false, or uncertain: 

a. Since the correlation between two variables, Y and A, can range from —1 to +1, 
this also means that cov ( Y, A) also lies between these limits. 

b. If the correlation between two variables is zero, it means that there is no relation¬ 
ship between the two variables whatsoever. 

c. If you regress Y t on 7, (i.e., actual Y on estimated 7), the intercept and slope 
values will be 0 and 1, respectively. 

3.17. Regression without any regressor. Suppose you are given the model: 7,- = + Ui. 

Use OLS to find the estimator of (i\. What is its variance and the RSS? Does the 
estimated fi\ make intuitive sense? Now consider the two-variable model 
Y t = Pi + fi 2 X, + ui. Is it worth adding A, to the model? If not, why bother with 
regression analysis? 

Empirical Exercises 

3.18. InTable3.5, you are given the ranks of 10 students in midterm and final examinations 
in statistics. Compute Spearman’s coefficient of rank correlation and interpret it. 


TABLE 3.5 


Student 

Rank ABCDEFCHIJ 

Midterm 1 37 10 954 826 

Final 328 79651014 
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3.19. The relationship between nominal exchange rate and relative prices. From annual 
observations from 1985 to 2005, the following regression results were obtained, 
where Y — exchange rate of the Canadian dollar to the U.S. dollar (CD/$) andX = 
ratio of the U.S. consumer price index to the Canadian consumer price index; that is, 
X represents the relative prices in the two countries: 

Y t = -0.912 + 2.250Y, r 2 = 0.440 
se = 0.096 

a. Interpret this regression. How would you interpret r 2 ? 

b. Does the positive value of X, make economic sense? What is the underlying 
economic theory? 

c. Suppose we were to redefine X as the ratio of the Canadian CPI to the U.S. CPI. 
Would that change the sign of XI Why? 

3.20. Table 3.6 gives data on indexes of output per hour (X) and real compensation per 
hour (7) for the business and nonfarm business sectors of the U.S. economy for 
1960-2005. The base year of the indexes is 1992 = 100 and the indexes are 
seasonally adjusted. 

a. Plot Y against X for the two sectors separately. 

b. What is the economic theory behind the relationship between the two variables? 
Does the scattergram support the theory? 

c. Estimate the OLS regression of Y on X. Save the results for a further look after we 
study Chapter 5. 

3.21. From a sample of 10 observations, the following results were obtained: 

^y, = 1,110 = 1,700 Yl, XiYi =205,500 

^ Xf = 322,000 ^ Yf = 132,100 

with coefficient of correlation r = 0.9758. But on rechecking these calculations it 
was found that two pairs of observations were recorded: 


Y 

90 

140 


X 

120 

220 


instead of 


Y X 
80 110 
150 210 


What will be the effect of this error on r? Obtain the correct r. 

3.22. Table 3.7 gives data on gold prices, the Consumer Price Index (CPI), and the New 

York Stock Exchange (NYSE) Index for the United States for the period 1974 -2006. 

The NYSE Index includes most of the stocks listed on the NYSE, some 1500-plus. 

a. Plot in the same scattergram gold prices, CPI, and the NYSE Index. 

b. An investment is supposed to be a hedge against inflation if its price and/or rate 
of return at least keeps pace with inflation. To test this hypothesis, suppose you 
decide to fit the following model, assuming the scatterplot in (a) suggests that this 
is appropriate: 


Gold price, = fi% + fa CPI, + u, 
NYSE index, = ft + ft CPI, + u, 
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TABLE 3.6 
Productivity and 
Related Data, 
Business Sector 
1960-2005 
(Index numbers, 

1992 =100; quarterly 
data seasonally 
adjusted) 


President, 2007, Table 4 



Output per Hour of All 

Real Compensation per 


Persons 1 


Hour 2 - 3 




Nonfarm 


Nonfarm 


Business 

Business 

Business 

Business 

Year 

Sector 

Sector 

Sector 

Sector 

1960 

48.9 

51.9 

60.8 

63.3 

1961 

50.6 

53.5 

62.5 

64.8 

1962 

52.9 

55.9 

64.6 

66.7 

1963 

55.0 

57.8 

66.1 

68.1 

1964 

56.8 

59.6 

67.7 

69.3 

1965 

58.8 

61.4 

69.1 

70.5 

1966 

61.2 

63.6 

71.7 

72.6 

1967 

62.5 

64.7 

73.5 

74.5 

1968 

64.7 

66.9 

76.2 

77.1 

1969 

65.0 

67.0 

77.3 

78.1 

1970 

66.3 

68.0 

78.8 

79.2 

1971 

69.0 

70.7 

80.2 

80.7 

1972 

71.2 

73.1 

82.6 

83.2 

1973 

73.4 

75.3 

84.3 

84.7 

1974 

72.3 

74.2 

83.3 

83.8 

1975 

74.8 

76.2 

84.1 

84.5 

1976 

77.1 

78.7 

86.4 

86.6 

1977 

78.5 

80.0 

87.6 

88.0 

1978 

79.3 

81.0 

89.1 

89.6 

1979 

79.3 

80.7 

89.3 

89.7 

1980 

79.2 

80.6 

89.1 

89.6 

1981 

80.8 

81.7 

89.3 

89.8 

1982 

80.1 

80.8 

90.4 

90.8 

1983 

83.0 

84.5 

90.3 

90.9 

1984 

85.2 

86.1 

90.7 

91.1 

1985 

87.1 

87.5 

92.0 

92.2 

1986 

89.7 

90.2 

94.9 

95.2 

1987 

90.1 

90.6 

95.2 

95.5 

1988 

91.5 

92.1 

96.5 

96.7 

1989 

92.4 

92.8 

95.0 

95.1 

1990 

94.4 

94.5 

96.2 

96.1 

1991 

95.9 

96.1 

97.4 

97.4 

1992 

100.0 

100.0 

100.0 

100.0 

1993 

100.4 

100.4 

99.7 

99.5 

1994 

101.3 

101.5 

99.0 

99.1 

1995 

101.5 

102.0 

98.7 

98.8 

1996 

104.5 

104.7 

99.4 

99.4 

1997 

106.5 

106.4 

100.5 

100.3 

1998 

109.5 

109.4 

105.2 

104.9 

1999 

112.8 

112.5 

108.0 

107.5 

2000 

116.1 

115.7 

112.0 

111.5 

2001 

119.1 

118.6 

113.5 

112.8 

2002 

124.0 

123.5 

115.7 

115.1 

2003 

128.7 

128.0 

117.7 

117.1 

2004 

132.7 

131.8 

119.0 

118.2 

2005 

135.7 

134.9 

120.2 

119.3 

‘Output refers toreal 

gross domestic product in the sector. 




3 Hourly compensatic 

in divided by the consumer price inde: 

k for all urban consumer' 

“fmm P cTnt t qumer fl s tPlanS ' 
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TABLE 3.7 ~ 

Gold Prices, New 
York Stock Exchange 1974 

Index, and Consumer 1975 

Price Index for U.S. 1976 

for 1974-2006 1977 

1978 

1979 

1980 

1981 

1982 

1983 

1984 

1985 

1986 

1987 

1988 

1989 

1990 

1991 

1992 

1993 

1994 

1995 

1996 

1997 

1998 

1999 

2000 
2001 
2002 

2003 

2004 

2005 

2006 


Gold Price 

159.2600 
161.0200 
124.8400 
157.7100 
193.2200 
306.6800 
612.5600 
460.0300 
375.6700 
424.3500 
360.4800 

317.2600 
367.6600 

446.4600 
436.9400 
381.4400 
383.5100 

362.1100 
343.8200 

359.7700 
384.0000 
384.1 700 

387.7700 
331.0200 
294.2400 
278.8800 

279.1100 
274.0400 
309.7300 
363.3800 
409.7200 
444.7400 

603.4600 


NYSE 

463.5400 

483.5500 

575.8500 

567.6600 

567.8100 

616.6800 

720.1500 

782.6200 

728.8400 

979.5200 

977.3300 

1142.970 

1438.020 

1709.790 

1585.140 

1903.360 

1939.470 

2181.720 

2421.510 

2638.960 

2687.020 

3078.560 

3787.200 

4827.350 

5818.260 

6546.810 

6805.890 
6397.850 

5578.890 
5447.460 
6612.620 
7349.000 
8357.990 


CPI 

49.30000 

53.80000 

56.90000 

60.60000 
65.20000 

72.60000 
82.40000 

90.90000 
96.50000 

99.60000 

103.9000 

107.6000 

109.6000 

113.6000 

118.3000 
124.0000 
130.7000 

136.2000 

140.3000 

144.5000 

148.2000 
152.4000 

156.9000 

160.5000 
163.0000 

166.6000 

172.2000 
177.1000 

179.9000 
184.0000 

188.9000 

195.3000 

201.6000 


3.23. Table 3.8 gives data on gross domestic product (GDP) for the United States for the 
years 1959-2005. 

a. Plot the GDP data in current and constant (i.e., 2000) dollars against time. 

b. Letting Y denote GDP and X time (measured chronologically starting with 1 for 
1959, 2 for 1960, through 47 for 2005), see if the following model fits the GDP 
data: 

Y t = Pi + f$2 X t + u t 

Estimate this model for both current and constant-dollar GDP. 

c. How would you interpret /L? 

d. If there is a difference between P2 estimated for current-dollar GDP and that 
estimated for constant-dollar GDP, what explains the difference? 

e. From your results what can you say about the nature of inflation in the United 
States over the sample period? 
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TABLE 3.8 
Nominal and Real 
Gross Domestic 
Product, 1959-2005 
(billions of dollars, 
except as noted; 
quarterly data at 
seasonally adjusted 
annual rates; RGDP 
in billions of chained 
[2000] dollars) 


Year 

NGDP 

RGDP 

1959 

506.6 

2,441.3 

1960 

526.4 

2,501.8 

1961 

544.7 

2,560.0 

1962 

585.6 

2,715.2 

1963 

617.7 

2,834.0 

1964 

663.6 

2,998.6 

1965 

719.1 

3,191.1 

1966 

787.8 

3,399.1 

1967 

832.6 

3,484.6 

1968 

910.0 

3,652.7 

1969 

984.6 

3,765.4 

1970 

1,038.5 

3,771.9 

1971 

1,127.1 

3,898.6 

1972 

1,238.3 

4,105.0 

1973 

1,382.7 

4,341.5 

1974 

1,500.0 

4,319.6 

1975 

1,638.3 

4,311.2 

1976 

1,825.3 

4,540.9 

1977 

2,030.9 

4,750.5 

1978 

2,294.7 

5,015.0 

1979 

2,563.3 

5,173.4 

1980 

2,789.5 

5,161.7 

1981 

3,128.4 

5,291.7 

1982 

3,255.0 

5,189.3 


Year 

NGDP 

RGDP 

1983 

3,536.7 

5,423.8 

1984 

3,933.2 

5,813.6 

1985 

4,220.3 

6,053.7 

1986 

4,462.8 

6,263.6 

1987 

4,739.5 

6,475.1 

1988 

5,103.8 

6,742.7 

1989 

5,484.4 

6,981.4 

1990 

5,803.1 

7,112.5 

1991 

5,995.9 

7,100.5 

1992 

6,337.7 

7,336.6 

1993 

6,657.4 

7,532.7 

1994 

7,072.2 

7,835.5 

1995 

7,397.7 

8,031.7 

1996 

7,816.9 

8,328.9 

1997 

8,304.3 

8,703.5 

1998 

8,747.0 

9,066.9 

1999 

9,268.4 

9,470.3 

2000 

9,817.0 

9,817.0 

2001 

10,128.0 

9,890.7 

2002 

10,469.6 

10,048.8 

2003 

10,960.8 

10,301.0 

2004 

11,712.5 

10,703.5 

2005 

12,455.8 

11,048.6 




3.24. Using the data given in Table 1.1 of the Introduction, verify Eq. (3.7.1). 

3.25. For the SAT example given in Exercise 2.16 do the following: 

a. Plot the female reading score against the male reading score. 

b. If the scatterplot suggests that a linear relationship between the two seems 
appropriate, obtain the regression of female reading score on male reading score. 

c. If there is a relationship between the two reading scores, is the relationship 
causal? 

3.26. Repeat Exercise 3.25, replacing math scores for reading scores. 

3.27. Monte Carlo study classroom assignment: Refer to the 10 X values given in 
Table 2.4. Let — 25 and P2 = 0.5. Assume u t ~ N{ 0, 9), that is, u, are normally 
distributed with mean 0 and variance 9. Generate 100 samples using these values, 
obtaining 100 estimates of ji\ and fn. Graph these estimates. What conclusions can 
you draw from the Monte Carlo study? Note: Most statistical packages now can gen¬ 
erate random variables from most well-known probability distributions. Ask your in¬ 
structor for help, in case you have difficulty generating such variables. 

3.28. Using the data given in Table 3.3, plot the number of cell phone subscribers against 
the number of personal computers in use. Is there any discernible relationship be¬ 
tween the two? If so, how do you rationalize the relationship? 
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Appendix 3A 


3A.1 Derivation of Least-Squares Estimates 

Differentiating Eq. (3.1.2) partially with respect to P\ and /§2, we obtain 

- h -to = -2^> (1) 

op 1 

^ = -2 YXT t -Pi- p 2 Xi)Xi = -2 V UiXi W 

9ft 

Setting these equations to zero, after algebraic simplification and manipulation, gives the estimators 
given in Eqs. (3.1.6) and (3.1.7). 


3A.2 Linearity and Unbiasedness Properties 
of Least-Squares Estimators 


From Eq. (3.1.8) we have 


Ew 






(3) 


which shows that ft is a linear estimator because it is a linear function of Y; actually it is a weighted 
average of K, with k t serving as the weights. It can similarly be shown that pi too is a linear estimator. 
Incidentally, note these properties of the weights kj : 

1. Since the X, are assumed to be nonstochastic, the ki are nonstochastic too. 

2. £*,«<>. 

3. = 

4. Y kiXi = Y kiX{ = 1. These properties can be directly verified from the definition of ki. 

For example, 

since for a given sample Y x f is known 

since Y x i> Ike sum of deviations from the mean value, is 
always zero 

Now substitute the PRF Y t = Pi + p 2 Xi + ui into Equation (3) to obtain 
ft = £>(ft +p 2 X i +u i ) 

= ft J2 k > +ftE k ‘ X ‘ + £ k ‘ u ‘ ( 4 ) 

= P2 + Y,kiUi 

where use is made of the properties of k t noted earlier. 
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Now taking expectation of Equation (4) on both sides and noting that k t , being nonstochastic, can 
be treated as constants, we obtain 

E{h) = h + £ ki E (Ui) 

since E(u t ) = 0 by assumption. Therefore, ft is an unbiased estimator of ft. Likewise, it can be 
proved that ft is also an unbiased estimator of ft. 


3A.3 Variances and Standard Errors 
of Least-Squares Estimators 

Now by the definition of variance, we can write 
var (ft) = £[ft - £(ft)] 2 

= E (ft — ft) 2 since E (ft) = ft 

= E ( Y, kjUi^J using Eq. (4) above 

= E(k[u\ + H-+ k%u 2 + 2k\k2U\U2 + ■ ■ ■ + 2k n -ik n u n -\u^j 

Since by assumption, £(«?) — a 2 for each i and E{uiUj) = 0, i -f j, it follows that 
var (ft) = a 2 ^ k 2 

— ^ (using the definition of k 2 ) 

— Eq. (3.3.1) 

The variance of ft can be obtained following the same line of reasoning already given. Once the 
variances of ft and ft are obtained, their positive square roots give the corresponding standard 


( 6 ) 


(7) 


3A.4 Covariance between /?i and fj 2 

By definition, 

cov(ft, ft) = £{[ft - £(ft)][ft - £(ft)]} 

= £(ft-ft)(ft-ft) (Why?) 

= -XE(fi2 - ft) 2 (8) 

= —X var (ft) 

= Eq. (3.3.9) 

where use is made of the fact that ft = Y — ft X and £(ft) = 7 — ftW, giving 
ft - £(ft) = -X(ft - ft). Note: var (ft) is given in Eq. (3.3.1). 

3A.5 The Least-Squares Estimator of a 2 


Recall that 


Y t =ft + ftW,+ 


( 9 ) 
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Therefore, 

? = p 1 +p 2 X + u (10) 

Subtracting Equation (10) from Equation (9) gives 

yi =PiXi +(Mi -U) (11) 

Also recall that 

Ui=yi-p2Xi (12) 

Therefore, substituting Equation (11) into Equation (12) yields 

Ui = 02Xi + (Ui ~U)~ f} 2 Xi ( 13 ) 

Collecting terms, squaring, and summing on both sides, we obtain 

E«? = (A. - &) 2 E*? + - 2 ^ 2 - a) E x ‘( u < - “) ( 14 ) 

Taking expectations on both sides gives 

E (E “?) = E x i E & - m 2 + E [E(“* - s)2 ] - 2E 2 - a) E - «>] 

= E * 2 var(^ 2 ) + (n- 1) var(u ; ) - 2£ 

= CT 2 + («-l)a 2 -2£[^^,« 2 ] ( 15 ) 

= tr 2 + (« — 1) cr 2 — 2<x 2 
= in - 2)cr 2 


where, in the last but one step, use is made of the definition of given in Eq. (3) and the relation 
given in Eq. (4). Also note that 

Ej2("i-*) 2 = E[j2u 2 -nu 2 ] 





= no 2 - \ 2 m(n — l)cr 2 

where use is made of the fact that the u, are uncorrelated and the variance of each «, is a 2 . 
Thus, we obtain 



*(E fi ?) = ( «- 2)ff2 

( 16 ) 

Therefore, if we define 

n — 2 

( 17 ) 

its expected value is 



E(a 2 ) = 

^ E (y, m 2 J = a 2 using Equation (16) 

( 18 ) 

which shows that a 2 is a 

n unbiased estimator of true a 2 . 
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3A.6 Minimum-Variance Property 
of Least-Squares Estimators 

It was shown in Appendix 3A, Section 3A.2, that the least-squares estimator p 2 is linear as well as 
unbiased (this holds true of too). To show that these estimators are also minimum variance in the 
class of all linear unbiased estimators, consider the least-squares estimator ■ 

w—^—E— = Jf* „ (see Appendix 3A.2) ( 19 ) 

£(*, - X) 2 J2xf 

which shows that /i 2 is a weighted average of the T’s, with ki serving as the weights. 

Let us define an alternative linear estimator of as follows: 

p* 2=J2 w ‘ y ‘ ( 2 °) 

where w, are also weights, not necessarily equal to ki . Now 



£(&*) = X>J?( 



=p 1 J2 w i+^H w < x i 

( 21 ) 

Therefore, for /J 2 to be unbiased, we must have 



E*-° 

( 22 ) 

and 


( 23 ) 

Also, we may wril 



var($) = varj> 

Y t 


= E^ar 

Yi [Note: var Y, = varw, = a 2 ] 



[Note: caw {Yi, Yj) = 0(1 / /)] 


|^(» 

i jp ^ gi # ^ 2 ^ 1 (Note the mathematical trick) 



+2a2 ^( Wi_ E?)(E?) 



+fr2 (E?) 

( 24 ) 


because the last term in the next to the last step drops out. (Why?) 

Since the last term in Equation (24) is constant, the variance of (/5|) can be minimized only by 
manipulating the first term. If we let 



Eq. (24) reduces 


= var(/3 2 ) 


( 25 ) 
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In words, with weights w, = k{, which are the least-squares weights, the variance of the linear esti¬ 
mator P2 is equal to the variance of the least-squares estimator otherwise var(/f|) > var(^2). To 
put it differently, if there is a minimum-variance linear unbiased estimator of P2, it must be the least- 
squares estimator. Similarly it can be shown that p\ is a minimum-variance linear unbiased estimator 
of Pi. 


3A.7 Consistency of Least-Squares Estimators 


We have shown that, in the framework of the classical linear regression model, the least-squares esti¬ 
mators are unbiased (and efficient) in any sample size, small or large. But sometimes, as discussed in 
Appendix A, an estimator may not satisfy one or more desirable statistical properties in small sam¬ 
ples. But as the sample size increases indefinitely, the estimators possess several desirable statistical 
properties. These properties are known as the large sample, or asymptotic, properties. In this ap¬ 
pendix, we will discuss one large sample property, namely, the property of consistency, which is dis¬ 
cussed more fully in Appendix A. For the two-variable model we have already shown that the OLS 
estimator pi is an unbiased estimator of the true Pi- Now we show that P2 is also a consistent esti¬ 
mator of P2. As shown in Appendix A, a sufficient condition for consistency is that P2 is unbiased 
and that its variance tends to zero as the sample size n tends to infinity. 

Since we have already proved the unbiasedness property, we need only show that the variance of 
P2 tends to zero as n increases indefinitely. We know that 






( 26 ) 


By dividing the numerator and denominator by n, we do not change the equality. 
Now 


lim var ( P2 ) = lim 


m 


( 27 ) 


where use is made of the facts that (1) the limit of a ratio quantity is the limit of the quantity in the 
numerator to the limit of the quantity in the denominator (refer to any calculus book); (2) as n tends 
to infinity, er 2 /n tends to zero because a 2 is a finite number; and [(^jc 2 )//!] ^ 0 because the vari¬ 
ance of A has a finite limit because of Assumption 7 of CLRM. 

The upshot of the preceding discussion is that the OLS estimator P2 is a consistent estimator of 
true P2 . In like fashion, we can establish that P\ is also a consistent estimator. Thus, in repeated 
(small) samples, the OLS estimators are unbiased and as the sample size increases indefinitely the 
OLS estimators are consistent. As we shall see later, even if some of the assumptions of CLRM are 
not satisfied, we may be able to obtain consistent estimators of the regression coefficients in several 
situations. 






Chapter 


Classical Normal 
Linear Regression 
Model (CNLRM) 

What is known as the classical theory of statistical inference consists of two branches, 
namely, estimation and hypothesis testing. We have thus far covered the topic of estima¬ 
tion of the parameters of the (two-variable) linear regression model. Using the method of 
OLS we were able to estimate the parameters f}\, P2, and a 2 . Under the assumptions of the 
classical linear regression model (CLRM), we were able to show that the estimators of 
these parameters, ft\, P2, and a 2 , satisfy several desirable statistical properties, such as 
unbiasedness, minimum variance, etc. (Recall the BLUE property.) Note that, since these 
are estimators, their values will change from sample to sample. Therefore, these estimators 
are random variables. 

But estimation is half the battle. Hypothesis testing is the other half. Recall that in 
regression analysis our objective is not only to estimate the sample regression function 
(SRF), but also to use it to draw inferences about the population regression function (PRF), 
as emphasized in Chapter 2. Thus, we would like to find out how close is to the true fi\ 
or how close d 2 is to the true a 2 . For instance, in Example 3.2, we estimated the SRF 
as shown in Eq. (3.7.2). But since this regression is based on a sample of 55 families, how 
do we know that the estimated MPC of 0.4368 represents the (true) MPC in the population 
as a whole? 

Therefore, since , fo, and d 2 are random variables, we need to find out their proba¬ 
bility distributions, for without that knowledge we will not be able to relate them to their 
true values. 


4.1 The Probability Distribution of Disturbances m; 

To find out the probability distributions of the OLS estimators, we proceed as follows. 
Specifically, consider jij. As we showed in Appendix 3A.2, 

where k, = x,/Y^ xf ■ But since the X’s are assumed fixed, or nonstochastic, because ours is 
conditional regression analysis, conditional on the fixed values of A), Equation 4.1.1 shows 
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that $2 is a linear function of Y t , which is random by assumption. But since 
Yi — Pi + p 2 Xi + Ui, we can write Eq. (4.1.1) as 

fc = J>(/Ji+A*i+«/) (4.1.2) 

Because k„ the betas, and X, are all fixed, fi 2 is ultimately a linear function of the random 
variable u„ which is random by assumption. Therefore, the probability distribution of f} 2 
(and also of ji\) will depend on the assumption made about the probability distribution of 
Ui. And since knowledge of the probability distributions of OLS estimators is necessary to 
draw inferences about their population values, the nature of the probability distribution of 
Ui assumes an extremely important role in hypothesis testing. 

Since the method of OLS does not make any assumption about the probabilistic nature 
of Ui, it is of little help for the purpose of drawing inferences about the PRF from the SRF, 
the Gauss-Markov theorem notwithstanding. This void can be filled if we are willing to 
assume that the ids follow some probability distribution. For reasons to be explained 
shortly, in the regression context it is usually assumed that the w’s follow the normal distri¬ 
bution. Adding the normality assumption for u, to the assumptions of the classical linear 
regression model (CLRM) discussed in Chapter 3, we obtain what is known as the classical 
normal linear regression model (CNLRM). 


4.2 The Normality Assumption for //,■ 

The classical normal linear regression model assumes that each u, is distributed normally 
with 

Mean: E{u t )= 0 (4.2.1) 

Variance: E[m - E( Ui )f = E(uf) = a 2 (4.2.2) 

cov (m, Uj ): E{[( Ui - E(ui)][uj - E(uj)]} = E(u t uj) = 0 i # j (4.2.3) 

The assumptions given above can be more compactly stated as 

m ~ N( 0, a 2 ) (4.2.4) 

where the symbol ~ means distributed as and N stands for the normal distribution, the 
terms in the parentheses representing the two parameters of the normal distribution, namely, 
the mean and the variance. 

As noted in Appendix A, for two normally distributed variables, zero covariance or 
correlation means independence of the two variables. Therefore, with the normality as¬ 
sumption, Equation 4.2.4 means that u, and Uj are not only uncorrelated but are also inde¬ 
pendently distributed. 

Therefore, we can write Eq. (4.2.4) as 

~NID(0,(r 2 ) (4.2.5) 


where NID stands for normally and independently distributed. 
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Why the Normality Assumption? 

Why do we employ the normality assumption? There are several reasons: 

1. As pointed out in Section 2.5, u, represent the combined influence (on the dependent 
variable) of a large number of independent variables that are not explicitly introduced in the 
regression model. As noted, we hope that the influence of these omitted or neglected 
variables is small and at best random. Now by the celebrated central limit theorem (CLT) 
of statistics (see Appendix A for details), it can be shown that if there are a large number 
of independent and identically distributed random variables, then, with a few exceptions, 
the distribution of their sum tends to a normal distribution as the number of such variables 
increases indefinitely. 1 It is the CLT that provides a theoretical justification for the assump¬ 
tion of normality of w,. 

2. A variant of the CLT states that, even if the number of variables is not very large 
or if these variables are not strictly independent, their sum may still be normally 
distributed. 2 

3. With the normality assumption, the probability distributions of OLS estimators can be 
easily derived because, as noted in Appendix A, one property of the normal distribution is 
that any linear function of normally distributed variables is itself normally distributed. 
As we discussed earlier, OLS estimators ji\ and /L are linear functions of u ,. Therefore, if u , 
are normally distributed, so are and fe, which makes our task of hypothesis testing very 
straightforward. 

4. The normal distribution is a comparatively simple distribution involving only two 
parameters (mean and variance); it is very well known and its theoretical properties have 
been extensively studied in mathematical statistics. Besides, many phenomena seem to 
follow the normal distribution. 

5. If we are dealing with a small, or finite, sample size, say data of less than 100 obser¬ 
vations, the normality assumption assumes a critical role. It not only helps us to derive the 
exact probability distributions of OLS estimators but also enables us to use the t, F, and x 2 
statistical tests for regression models. The statistical properties of t, F, and x 2 probability 
distributions are discussed in Appendix A. As we will show subsequently, if the sample size 
is reasonably large, we may be able to relax the normality assumption. 

6. Finally, in large samples, t and F statistics have approximately the t and F probabil¬ 
ity distributions so that the t and F tests that are based on the assumption that the error term 
is normally distributed can still be applied validly. 3 These days there are many cross-section 
and time series data that have a fairly large number of observations. Therefore, the normality 
assumption may not be very crucial in large data sets. 

A cautionary note: Since we are “imposing” the normality assumption, it behooves us to 
find out in practical applications involving small sample size data whether the normality 


Tor a relatively simple and straightforward discussion of this theorem, see Sheldon M. Ross, 
Introduction to Probability and Statistics for Engineers and Scientists, 2d ed., Harcourt Academic Press, 
New York, 2000, pp. 193-194. One exception to the theorem is the Cauchy distribution, which has 
no mean or higher moments. See M. C. Kendall and A. Stuart, The Advanced Theory of Statistics, 
Charles Griffin & Co., London, 1960, vol. 1, pp. 248-249. 

Tor the various forms of the CLT, see Harald Cramer, Mathematical Methods of Statistics, Princeton 
University Press, Princeton, NJ, 1946, Chap. 17. 

Tor a technical discussion on this point, see Christiaan Heij et al., Econometric Methods with 
Applications in Business and Economics, Oxford University Press, Oxford, 2004, p. 197. 
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assumption is appropriate. Later, we will develop some tests to do just that. Also, later we 
will come across situations where the normality assumption may be inappropriate. But until 
then we will continue with the normality assumption for the reasons discussed previously. 


4.3 Properties of OLS Estimators under 
the Normality Assumption 


With the assumption that u, follow the normal distribution as in Equation 4.2.5, the OLS 
estimators have the following properties (Appendix A provides a general discussion of the 
desirable statistical properties of estimators): 

1. They are unbiased. 

2. They have minimum variance. Combined with 1, this means that they are minimum- 
variance unbiased, or efficient estimators. 

3 . They have consistency; that is, as the sample size increases indefinitely, the estimators 
converge to their true population values. 

4. ft (being a linear function of w,) is normally distributed with 


Or more compactly, 


Mean: 

Em = Pi 

var (ft): 

2 E4 2 



ft^(ft,<r|) 


Then by the properties of the normal distribution, the variable Z, which is defined as 


i 



(4.3.3) 


follows the standard normal distribution, that is, a normal distribution with zero mean 
and unit (=1) variance, or 

Z ~ N{ 0, 1) 

5. ft (being a linear function of «,■) is normally distributed with 


Mean: £(ft) = ft (4.3.4) 

var(ft): <r| = = (3.3.1) (4.3.5) 


Or, more compactly, 

A-rift-ri.) 

Then, as in Equation 4.3.3, 

z = hz* 

a h 


(4.3.6) 


also follows the standard normal distribution. 

Geometrically, the probability distributions of ft and ft are shown in Figure 4.1. 
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6. (n — 2){a 2 /a 2 ) is distributed as the x 2 (chi-square) distribution with (n - 2)df. 4 
This knowledge will help us to draw inferences about the true a 2 from the estimated a 2 , as 
we will show in Chapter 5. (The chi-square distribution and its properties are discussed in 

Appendix A.) 

7. are distributed independently of a 2 . The importance of this will be 
explained in the next chapter. 

8. and fa have minimum variance in the entire class of unbiased estimators, whether 
linear or not. This result, due to Rao, is very powerful because, unlike the Gauss-Markov 
theorem, it is not restricted to the class of linear estimators only. 5 Therefore, we can say that 
the least-squares estimators are best unbiased estimators (BUE); that is, they have mini¬ 
mum variance in the entire class of unbiased estimators. 

To sum up: The important point to note is that the normality assumption enables us to 
derive the probability, or sampling, distributions of ji\ and yS 2 (both normal) and b 2 (related 
to the chi square). As we will see in the next chapter, this simplifies the task of establishing 
confidence intervals and testing (statistical) hypotheses. 

In passing, note that, with the assumption that m, ~ N{ 0, a 2 ), Y t , being a linear func¬ 
tion of Ui, is itself normally distributed with the mean and variance given by 

E{Yi) = h+hXt (4.3.7) 

var {Yi) = o 2 (4.3.8) 

More neatly, we can write 

Y, ~ N{^+f 2 Xi,cr 2 ) (4.3.9) 


4 The proof of this statement is slightly involved. An accessible source for the proof is Robert V. Hogg 
and Allen T. Craig, Introduction to Mathematical Statistics, 2d ed., Macmillan, New York, 1965, p. 144. 

5 C. R. Rao, Linear Statistical Inference and Its Applications, John Wiley & Sons, New York, 1965, p. 258. 
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4.4 The Method of Maximum Likelihood (ML) 

A method of point estimation with some stronger theoretical properties than the method of 
OLS is the method of maximum likelihood (ML). Since this method is slightly involved, 
it is discussed in the appendix to this chapter. For the general reader, it will suffice to note 
that if a, are assumed to be normally distributed, as we have done for reasons already dis¬ 
cussed, the ML and OLS estimators of the regression coefficients, the /Fs, are identical, and 
this is true of simple as well as multiple regressions. The ML estimator of a 2 3 4 5 6 7 8 9 10 is u 2 /n. 
This estimator is biased, whereas the OLS estimator of a 1 = J2 u 2 /{n - 2), as we have 
seen, is unbiased. But comparing these two estimators of cr 2 , we see that as the sample size 
n gets larger the two estimators of a 2 tend to be equal. Thus, asymptotically (i.e., as n in¬ 
creases indefinitely), the ML estimator of a 2 is also unbiased. 

Since the method of least squares with the added assumption of normality of «, provides 
us with all the tools necessary for both estimation and hypothesis testing of the linear re¬ 
gression models, there is no loss for readers who may not want to pursue the maximum 
likelihood method because of its slight mathematical complexity. 


Summary and 
Conclusions 


1. This chapter discussed the classical normal linear regression model (CNLRM). 

2. This model differs from the classical linear regression model (CLRM) in that it specifi¬ 
cally assumes that the disturbance term u, entering the regression model is normally dis¬ 
tributed. The CLRM does not require any assumption about the probability distribution 
of Uj ; it only requires that the mean value of «,■ is zero and its variance is a finite constant. 

3. The theoretical justification for the normality assumption is the central limit theorem. 

4. Without the normality assumption, under the other assumptions discussed in Chapter 3, 
the Gauss-Markov theorem showed that the OLS estimators are BLUE. 

5. With the additional assumption of normality, the OLS estimators are not only best 
unbiased estimators (BUE) but also follow well-known probability distributions. The 
OLS estimators of the intercept and slope are themselves normally distributed and 
the OLS estimator of the variance of w,( = or 2 ) is related to the chi-square distribution. 

6. In Chapters 5 and 8 we show how this knowledge is useful in drawing inferences about 
the values of the population parameters. 

7. An alternative to the least-squares method is the method of maximum likelihood 
(ML). To use this method, however, one must make an assumption about the probabil¬ 
ity distribution of the disturbance term u, . In the regression context, the assumption 
most popularly made is that u, follows the normal distribution. 

8. Under the normality assumption, the ML and OLS estimators of the intercept and slope 
parameters of the regression model are identical. However, the OLS and ML estimators of 
the variance of u, are different. In large samples, however, these two estimators converge. 

9. Thus the ML method is generally called a large-sample method. The ML method is of 
broader application in that it can also be applied to regression models that are nonlin¬ 
ear in the parameters. In the latter case, OLS is generally not used. For more on this, 
see Chapter 14. 

10. In this text, we will largely rely on the OLS method for practical reasons: (a) Com¬ 
pared to ML, the OLS is easy to apply; (b) the ML and OLS estimators of fi\ and fa are 
identical (which is true of multiple regressions too); and (c) even in moderately large 
samples the OLS and ML estimators of rx 2 do not differ vastly. 


However, for the benefit of the mathematically inclined reader, a brief introduction to 
ML is given in the appendix to this chapter and also in Appendix A. 
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Appendix 4A 


4A.1 Maximum Likelihood Estimation 
of Two-Variable Regression Model 


Assume that in the two-variable model 7, = Pi + p 2 Xi + u, the 7, are normally and independently 
distributed with mean = f}\ + P 2 X, and variance = o 1 . (See Eq. [4.3.9].) As a result, the joint proba¬ 
bility density function of Y\, Y2,... ,Y n , given the preceding mean and variance, can be written as 

f(Y u Y 2 ,...,Y n \p l +p 2 Xi,cr 2 ) 

But in view of the independence of the T’s, this joint probability density function can be written as a 
product of n individual density functions as 




f(Yi,Y 2 ,...,Y„m+p 2 X i ,a*) 

= f(Yi | Pi + fh.X ,, <t 2 )/(7 2 I Pi + p 2 Xi , cr 2 ) ■ ■ ■ f(Y n | Pi + p 2 X i , or 2 ) (1) 


f(Ji) = 


<rV2 n 


1 (7 -Pi- fhXt ) 2 

2 cr 2 


( 2 ) 


which is the density function of a normally distributed variable with the given mean and variance. 
(Note: exp means e to the power of the expression indicated by {}.) 

Substituting Equation (2) for each 7 into Equation (1) gives 

m, T 2 ,...,Y„\Pi + p 2 x u a 2 ) = H 1 „expj-V }. ( 3 ) 

<7 n (y/2jT\ * G \ 

If 7i, 72,..., Y„ are known or given, but Pi, p 2 , and a 2 are not known, the function in Equa¬ 
tion (3) is called a likelihood function, denoted by LF^i, p 2 , a 2 ), and written as 1 


LF (Pi,p 2 ,cr 2 ) = 



i v (teA- 

2 2—> 


(4) 


The method of maximum likelihood, as the name indicates, consists in estimating the unknown 
parameters in such a manner that the probability of observing the given 7’s is as high (or maximum) 
as possible. Therefore, we have to find the maximum of the function in Equation (4). This is a 
straightforward exercise in differential calculus. For differentiation it is easier to express Equation (4) 
in the log term as follows. 2 (Note: In = natural log.) 


”, 2 n l v (7-ft-M) 2 

= “2 lnff “ 2 ln(27r) “ 2 ^ - JM l 


(5) 


’Of course, if pi, p 2 , and a 2 are known but the Y, are not known, Eq. (4) represents the joint probabil¬ 
ity density function—the probability of jointly observing the 7,. 

2 Since a log function is a monotonic function, In LF will attain its maximum value at the same 


point as LF. 
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Differentiating Equation (5) partially with respect to fii, fii, and a 2 , we obtain 

= -J 2 X« -fr- (6) 

= -~ 2 - A - A*i)(~*i) (7) 

(«) 

Setting these equations equal to zero (the first-order condition for optimization) and letting fa , fa. 


and a 2 denote the ML estimators, we obtain 3 

^£( 7 , ( 9 ) 
(10) 

-i + ^ 4 E (y '-^-^) 2 = ° OD 

After simplifying, Eqs. (9) and (10) yield 

( 12 ) 

X) -aX a +fa X A (i 3) 


which are precisely the normal equations of the least-squares theory obtained in Eqs. (3.1.4) and 
(3.1.5). Therefore, the ML estimators, the fas, are the same as the OLS estimators, the /Ts, given in 
Eqs. (3.1.6) and (3.1.7). This equality is not accidental. Examining the likelihood (5), we see that the 
last term enters with a negative sign. Therefore, maximizing Equation (5) amounts to minimizing this 
term, which is precisely the least-squares approach, as can be seen from Eq. (3.1.2). 

Substituting the ML ( = OLS) estimators into Equation (11) and simplifying, we obtain the ML 
estimator of cr 2 as 


a 2 = X - YjJi ~ fa ~ faxtf 

= X n Y.(Ji-fa-fax i f ( 14 ) 

4X“? 


From Equation (14) it is obvious that the ML estimator cr 2 differs from the OLS estimator 
a 1 = [1 /(« — 2)] 22 uj > which was shown to be an unbiased estimator of a 2 in Appendix 3A, Sec¬ 
tion 3A.5. Thus, the ML estimator of a 2 is biased. The magnitude of this bias can be easily deter¬ 
mined as follows. 


3 We use " (tilde) for ML estimators and * (cap or hat) for OLS estimators. 
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Taking the mathematical expectation of Equation (14) on both sides, we obtain 

using Eq. (16) of Appendix 3 A, 
Section 3A.5 




( 15 ) 


which shows that n 2 * is biased downward (i.e., it underestimates the true cr 2 ) in small samples. But 
notice that as n, the sample size, increases indefinitely, the second term in Equation (15), the bias fac¬ 
tor, tends to be zero. Therefore, asymptotically (i.e., in a very large sample), a 2 is unbiased too, that 
is, lim E(b 2 ) = a 2 as n -> oo. It can further be proved that a 2 is also a consistent estimator 4 ; that 
is, as n increases indefinitely, a 2 converges to its true value a 1 . 

4A.2 Maximum Likelihood Estimation 
of Food Expenditure in India 

Return to Example 3.2 and Equation 3.7.2, which gives the regression of food expenditure on total 
expenditure for 55 rural households in India. Since under the normality assumption the OLS and ML es¬ 
timators of the regression coefficients are the same, we obtain the ML estimators as f)\ = f}\ = 94.2087 
and jii ~ bi = 0.4368. The OLS estimator of a 2 is or 2 = 4469.6913, but the ML estimator is 
or 2 = 4407.1563, which is smaller than the OLS estimator. As noted, in small samples the ML estimator 
is downward biased; that is, on average it underestimates the true variance cr 2 . Of course, as you would 
expect, as the sample size gets bigger, the difference between the two estimators will narrow. Putting the 
values of the estimators in the log likelihood function, we obtain the value of —308.1625. If you want the 
maximum value of the LF, just take the antilog of—308.1625.No other values of the parameters will give 
you a higher probability of obtaining the sample that you have used in the analysis. 

Appendix 4A Exercises 


4.1. “If two random variables are statistically independent, the coefficient of correlation between the 
two is zero. But the converse is not necessarily true; that is, zero correlation does not imply 
statistical independence. However, if two variables are normally distributed, zero correlation 
necessarily implies statistical independence.” Verify this statement for the following joint 
probability density function of two normally distributed variables Y\ and Y2 (this joint 
probability density function is known as the bivariate normal probability density function): 


<[(^ 


{ 2(1 -p 2 ) 

(Yi-ai)(Y 2 -H2) + ( y 2 -hi \ 
oicr 2 V a 2 ) 


4 See Appendix A for a general discussion of the properties of the maximum likelihood estimators as 
well as for the distinction between asymptotic unbiasedness and consistency. Roughly speaking, in 
asymptotic unbiasedness we try to find out the lim E(a 2 ) as n tends to infinity, where n is the sample 
size on which the estimator is based, whereas in consistency we try to find out how cr 2 behaves as n 
increases indefinitely. Notice that the unbiasedness property is a repeated sampling property of an 
estimator based on a sample of given size, whereas in consistency we are concerned with the 
behavior of an estimator as the sample size increases indefinitely. 
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where = mean of Y\ 

H 2 = mean of Y 2 
o i = standard deviation of Y\ 
a 2 = standard deviation of Y 2 
p = coefficient of correlation between Yi and Y 2 

4.2. By applying the second-order conditions for optimization (i.e., second-derivative test), show that 
the ML estimators of fi\, fii, and o 2 obtained by solving Eqs. (9), (10), and (11) do in fact 
maximize the likelihood function in Eq. (4). 

4.3. A random variable X follows the exponential distribution if it has the following probability 
density function (PDF): 


f(X) = (1 IG)e~ xle for X > 0 

= 0 elsewhere 

where 9 > 0 is the parameter of the distribution. Using the ML method, show that the ML 
estimator of 9 is 9 = X,- / n, where n is the sample size. That is, show that the ML estimator 

of 9 is the sample mean X. 

4.4. Suppose that the outcome of an experiment is classified as either a success or a failure. Letting 
X = 1 when the outcome is a success and X = 0 when it is a failure, the probability density, or 
mass, function of X is given by 

p(X=0) = l-p 
p{X =zY)=p,0<p<l 

What is the maximum likelihood estimator of p, the probability of success? 



Chapter 


Two-Variable 
Regression: Interval 
Estimation and 
Hypothesis Testing 

Beware of testing too many hypotheses; the more you torture the data, the more likely they are 
to confess, but confession obtained under duress may not be admissible in the court of scientific 
opinion. 1 

As pointed out in Chapter 4, estimation and hypothesis testing constitute the two major 
branches of classical statistics. The theory of estimation consists of two parts: point 
estimation and interval estimation. We have discussed point estimation thoroughly in the 
previous two chapters where we introduced the OLS and ML methods of point estimation. 
In this chapter we first consider interval estimation and then take up the topic of hypothesis 
testing, a topic intimately related to interval estimation. 


5.1 Statistical Prerequisites 

Before we demonstrate the actual mechanics of establishing confidence intervals and 
testing statistical hypotheses, it is assumed that the reader is familiar with the funda¬ 
mental concepts of probability and statistics. Although not a substitute for a basic course 
in statistics, Appendix A provides the essentials of statistics with which the reader 
should be totally familiar. Key concepts such as probability, probability distributions, 
Type I and Type II errors, level of significance, power of a statistical test, and 
confidence interval are crucial for understanding the material covered in this and the 
following chapters. 


'Stephen M. Stigler, "Testing Hypothesis or Fitting Models? Another Look at Mass Extinctions," in 
Matthew H. Nitecki and Antoni Hoffman, eds., Neutral Models in Biology, Oxford University Press, 
Oxford, 1987, p. 148. 
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5.2 Interval Estimation: Some Basic Ideas 


To fix the ideas, consider the wages-education example of Chapter 3. Equation (3.6.1) 
shows that the estimated average increase in mean hourly wage related to a one-year 
increase in schooling (ft) is 0.7240, which is a one number (point) estimate of the 
unknown population value ft ■ How reliable is this estimate? As noted in Chapter 3, because 
of sampling fluctuations, a single estimate is likely to differ from the true value, although 
in repeated sampling its mean value is expected to be equal to the true value. [Note: 
is (ft) = @2-] Now in statistics, the reliability of a point estimator is measured by its 
standard error. Therefore, instead of relying on the point estimate alone, we may construct 
an interval around the point estimator, say within two or three standard errors on either side 
of the point estimator, such that this interval has, say, 95 percent probability of including 
the true parameter value. This is roughly the idea behind interval estimation. 

To be more specific, assume that we want to find out how “close,” say, $2 is to ft. For 
this purpose we try to find out two positive numbers <5 and a, the latter lying between 0 and 
1, such that the probability that the random interval (ft — 8, ft + 8) contains the true ft 
is 1 — a. Symbolically, 

Pr(ft-5<ft <ft + S) = l-« (5.2.1) 

Such an interval, if it exists, is known as a confidence interval; 1 — a is known as the 
confidence coefficient; and a (0 < a < 1) is known as the level of significance. 2 The end¬ 
points of the confidence interval are known as the confidence limits (also known as critical 
values), ft — 8 being the lower confidence limit and ft + <5 the upper confidence limit. 
In passing, note that in practice a and 1 — a are often expressed in percentage forms as 
100a and 100(1 — a) percent. 

Equation 5.2.1 shows that an interval estimator, in contrast to a point estimator, is an 
interval constructed in such a manner that it has a specified probability 1 — a of including 
within its limits the true value of the parameter. For example, if a — 0.05, or 5 percent, 
Eq. (5.2.1) would read: The probability that the (random) interval shown there includes the 
true ft is 0.95, or 95 percent. The interval estimator thus gives a range of values within 
which the true ft may lie. 

It is very important to know the following aspects of interval estimation: 

1. Eq. (5.2.1) does not say that the probability of ft lying between the given limits is 
1 — a. Since ft, although an unknown, is assumed to be some fixed number, either it lies 
in the interval or it does not. What Eq. (5.2.1) states is that, for the method described in this 
chapter, the probability of constructing an interval that contains ft is 1 — a. 

2. The interval in Eq. (5.2.1) is a random interval; that is, it will vary from one sample 
to the next because it is based on ft, which is random. (Why?) 

3. Since the confidence interval is random, the probability statements attached to it 
should be understood in the long-run sense, that is, repeated sampling. More specifically, 
Eq. (5.2.1) means: If in repeated sampling confidence intervals like it are constructed a 


2 AIso known as the probability of committing a Type I error. A Type I error consists in 
rejecting a true hypothesis, whereas a Type II error consists in accepting a false hypothesis. (This 
topic is discussed more fully in Appendix A.) The symbol a is also known as the size of the 
(statistical) test. 
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great many times on the 1 — a probability basis, then, in the long run, on the average, such 
intervals will enclose in 1 — a of the cases the true value of the parameter. 

4. As noted in (2), the interval in Eq. (5.2.1) is random so long as fa is not known. But 
once we have a specific sample and once we obtain a specific numerical value of fa, the in¬ 
terval in Eq. (5.2.1) is no longer random; it is fixed. In this case, we cannot make the prob¬ 
abilistic statement in Eq. (5.2.1); that is, we cannot say that the probability is 1 — a that a 
given fixed interval includes the true fa. In this situation, fa is either in the fixed interval or 
outside it. Therefore, the probability is either 1 or 0. Thus, for our wages-education exam¬ 
ple, if the 95 percent confidence interval were obtained as (0.5700 < fa < 0.8780), as we 
do shortly in Eq. (5.3.9), we cannot say the probability is 95 percent that this interval in¬ 
cludes the true fa. That probability is either 1 or 0. 

How are the confidence intervals constructed? From the preceding discussion one may 
expect that if the sampling or probability distributions of the estimators are known, one 
can make confidence interval statements such as Eq. (5.2.1). In Chapter 4 we saw that 
under the assumption of normality of the disturbances m, the OLS estimators fa and fa are 
themselves normally distributed and that the OLS estimator a 1 is related to the x 2 (chi- 
square) distribution. It would then seem that the task of constructing confidence intervals is 
a simple one. And it is! 


5.3 Confidence Intervals for Regression Coefficients f $i and /?2 
Confidence Interval for p 2 

It was shown in Chapter 4, Section 4.3, that, with the normality assumption for u„ the OLS 
estimators fa and fa are themselves normally distributed with means and variances given 
therein. Therefore, for example, the variable 

z _ fa ~ fa 

se(fa) 

- (5-3.1) 

(fa - fa)^T, x i 

a 

as noted in Eq. (4.3.6), is a standardized normal variable. It therefore seems that we can use 
the normal distribution to make probabilistic statements about fa provided the true popula¬ 
tion variance a 2 is known. If cr 2 is known, an important property of a normally distributed 
variable with mean /x and variance a 2 is that the area under the normal curve between // ± a 
is about 68 percent, that between the limits /x ± 2cr is about 95 percent, and that between 
/x ± 3cr is about 99.7 percent. 

But a 2 is rarely known, and in practice it is determined by the unbiased estimator <r 2 . If 
we replace a by a. Equation 5.3.1 may be written as 

_ fa ~ fa _ Estimator - Parameter 

se ( fa) Estimated standard error of estimator 

(fa ~ fa)-Jj2 xf 

a 


(5.3.2) 
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where the se (ft2) now refers to the estimated standard error. It can be shown (see Appen¬ 
dix 5A, Section 5A.2) that the t variable thus defined follows the t distribution with n — 2 df. 
[Note the difference between Eqs. (5.3.1) and (5.3.2).] Therefore, instead of using the nor¬ 
mal distribution, we can use the t distribution to establish a confidence interval for ft as 
follows: 


Pr (-4/2 < t < t a/2 ) = \- a (5.3.3) 

where the t value in the middle of this double inequality is the t value given by Equa¬ 
tion 5.3.2 and where 4/2 is the value of the t variable obtained from the t distribution for 
a/2 level of significance and n — 2 df; it is often called the critical t value at a/2 level of 
significance. Substitution of Eq. (5.3.2) into Equation 5.3.3 yields 

Pr \-t a , 2 < < 4/ J m 1 - a (5.3.4) 

L se (#o J 

Rearranging Equation 5.3.4, we obtain 


Pr [ft - 4 /2 se (ft) < ft < ft ± tap. se (ft)] = 1 - « (5.3.5) 3 

Equation 5.3.5 provides a 100(1 — a) percent confidence interval for ft, which can be 
written more compactly as 

100(1 — a)% confidence interval for ft: 

ft ±t a/2 se(ft) (5-3.6) 

Arguing analogously, and using Eqs. (4.3.1) and (4.3.2), we can then write: 

Pr [ft - ta/i se (ft) < ft < ft + t a/ 2 se (ft)] = 1 - a (5.3.7) 

or, more compactly, 


100(1 — a)% confidence interval for Pi: 

ft ±4/2 se (ft) (5.3.8) 

Notice an important feature of the confidence intervals given in Equations 5.3.6 and 
5.3.8: In both cases the width of the confidence interval is proportional to the standard 
error of the estimator. That is, the larger the standard error, the larger is the width of the 
confidence interval. Put differently, the larger the standard error of the estimator, the 
greater is the uncertainty of estimating the true value of the unknown parameter. Thus, 
the standard error of an estimator is often described as a measure of the precision of the 
estimator (i.e., how precisely the estimator measures the true population value). 

3 Some authors prefer to write Eq. (5.3.5) with the df explicitly indicated. Thus, they would write 
Pr [ft - t(n—2),a/2 se (ft) < (62 < ft + \n-2)a/2 se (ft)] = "1 - a 
But for simplicity we will stick to our notation; the context clarifies the appropriate df involved. 
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Returning to our regression example in Chapter 3 (Section 3.6) of mean hourly wages 
(7) on education (X), recall that we found in Table 3.2 that fa = 0.7240; se (fa) = 0.0700. 
Since there are 13 observations, the degrees of freedom (df) are 11. If we assume a = 5%, 
that is, a 95% confidence coefficient, then the t table shows that for 11 df the critical 
4/2 = 2.201. Substituting these values in Eq. (5.3.5), the reader should verify that the 
95 percent confidence interval for fa is as follows: * * 4 

0.5700 < fa < 0.8780 (5.3.9) 


Or, using Eq. (5.3.6), it is 


0.7240 ±2.201(0.0700) 


that is, 


0.7240 ±0.1540 (5.3.10) 

The interpretation of this confidence interval is: Given the confidence coefficient of 
95 percent, in 95 out of 100 cases intervals like Equation 5.3.9 will contain the true fa. But, 
as warned earlier, we cannot say that the probability is 95 percent that the specific interval 
in Eq. (5.3.9) contains the true fa because this interval is now fixed and no longer random; 
therefore fa either lies in it or it does not: The probability that the specified fixed interval 
includes the true fa is therefore 1 or 0. 

Following Eq. (5.3.7), and the data in Table 3.2, the reader can easily verify that the 
95 percent confidence interval for fa for our example is 

— 1.8871 < fa < 1.8583 (5.3.11) 

Again you should be careful in interpreting this confidence interval. In 95 out of 100 
cases, intervals like Equation 5.3.11 will contain the true fa\ the probability that this par¬ 
ticular fixed interval includes the true fa is either 1 or 0. 


Confidence Interval for fa and fa Simultaneously 

There are occasions when one needs to construct a joint confidence interval for fa and fa 
such that with a confidence coefficient (1 — a), say, 95 percent, that interval includes fa and 
fa simultaneously. Since this topic is involved, the interested reader may want to consult 
appropriate references. 5 We will touch on this topic briefly in Chapters 8 and 10. 


5.4 Confidence Interval for o 2 

As pointed out in Chapter 4, Section 4.3, under the normality assumption, the variable 

X 2 = (n- 2)^ (5.4.1) 


4 Because of rounding errors in Table 3.2, the answers given below may not exactly match the 
answers obtained from a statistical package. 

s For an accessible discussion, see John Neter, William Wasserman, and Michael H. Kutner, Applied 
Linear Regression Models, Richard D. Irwin, Homewood, III., 1983, Chap. 5. 
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FIGURE 5.1 

The 95% confidence 
interval for x 2 (11 df). 


fix 2 ) 



*0.975 *0.025 


follows the x 2 distribution with n — 2 df. 6 Therefore, we can use the x 2 distribution to 
establish a confidence interval for a 2 

Pr (xl a/2 <X 2 < X 2 I2 ) = 1 - « (5.4.2) 

where the x 2 value in the middle of this double inequality is as given by Equation 5.4.1 and 
where X\- a /2 ar| d Xa/i are two values of y 2 (the critical x 2 values) obtained from the chi- 
square table for n — 2 df in such a manner that they cut off 100(«/2) percent tail areas of the 
X 2 distribution, as shown in Figure 5.1. 

Substituting x 2 from Eq. (5.4.1) into Equation 5.4.2 and rearranging the terms, we 
obtain 


Pr (n - 2)-=- <o 2 <(n- 2)- 

L * 


(5.4.3) 


which gives the 100(1 — a)% confidence interval for a 2 . 

Continuing with our wages-education example, we found in Table 3.2 that for our 
data we have a 2 = 0.8936. If we choose a of 5%, the chi-square table for 11 df gives the 
following critical values: X0025 = 21.9200, and X0.975 = 3 .8157. These values show that 
the probability of a chi-square value exceeding 21.9200 is 2.5 percent and that of 3.8157 is 
97.5 percent. Therefore, the interval between these two values is the 95 percent confidence 
interval for y 2 , as shown in Figure 5.1. (Note the skewed characteristic of the chi-square 
distribution.) 

Substituting the data of our example into Eq. (5.4.3), the reader can verify that the 
95 percent confidence interval for <r 2 is as follows: 

0.4484 < a 2 < 2.5760 (5.4.4) 

The interpretation of this interval is: If we establish 95 percent confidence limits on a 2 
and if we maintain a priori that these limits will include the true a 2 , we will be right in the 
long rim 95 percent of the time. 

6 For proof, see Robert V. Hogg and Allen T. Craig, Introduction to Mathematical Statistics, 2d ed., 
Macmillan, New York, 1965, p. 144. 
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5.5 Hypothesis Testing: General Comments 

Having discussed the problem of point and interval estimation, we shall now consider the 
topic of hypothesis testing. In this section we discuss briefly some general aspects of this 
topic; Appendix A gives some additional details. 

The problem of statistical hypothesis testing may be stated simply as follows: Is a given 
observation or finding compatible with some stated hypothesis or not? The word “compati¬ 
ble,” as used here, means “sufficiently” close to the hypothesized value so that we do not re¬ 
ject the stated hypothesis. Thus, if some theory or prior experience leads us to believe that 
the true slope coefficient f} 2 of the wages-education example is unity, is the observed ft 2 = 
0.724 obtained from the sample of Table 3.2 consistent with the stated hypothesis? If it is, we 
do not reject the hypothesis; otherwise, we may reject it. 

In the language of statistics, the stated hypothesis is known as the null hypothesis and 
is denoted by the symbol Hq. The null hypothesis is usually tested against an alternative 
hypothesis (also known as maintained hypothesis) denoted by H\ , which may state, for 
example, that true [i 2 is different from unity. The alternative hypothesis may be simple or 
composite. 7 For example, Hp.foms 1.5 is a simple hypothesis, hut H\ : fi 2 / 1.5 is a com¬ 
posite hypothesis. 

The theory of hypothesis testing is concerned with developing rules or procedures for 
deciding whether to reject or not reject the null hypothesis. There are two mutually comple¬ 
mentary approaches for devising such rules, namely, confidence interval and test of 
significance. Both these approaches predicate that the variable (statistic or estimator) under 
consideration has some probability distribution and that hypothesis testing involves making 
statements or assertions about the value(s) of the parameter(s) of such distribution. For 
example, we know that with the normality assumption fi 2 is normally distributed with mean 
equal to ft 2 and variance given by Eq. (4.3.5). If we hypothesize that f} 2 = 1, we are making 
an assertion about one of the parameters of the normal distribution, namely, the mean. Most 
of the statistical hypotheses encountered in this text will he of this type—making assertions 
about one or more values of the parameters of some assumed probability distribution such as 
the normal, F, t, or y 1 . How this is accomplished is discussed in the following two sections. 

5.6 Hypothesis Testing: The Confidence-Interval Approach 

Two-Sided or Two-Tail Test 

To illustrate the confidence interval approach, once again we revert to our wages-education 
example. From the regression results given in Eq. (3.6.1), we know that the slope coeffi¬ 
cient is 0.7240. Suppose we postulate that 

H 0 -.p 2 =0.5 

that is, the true slope coefficient is 0.5 under the null hypothesis but less than or greater than 
0.5 under the alternative hypothesis. The null hypothesis is a simple hypothesis, whereas 


7 A statistical hypothesis is called a simple hypothesis if it specifies the precise value(s) of the 
parameter(s) of a probability density function; otherwise, it is called a composite hypothesis. For 
example, in the normal pdf (1 /os/ljr) exp {—— /z)/cr] 2 ), if we assert that Hy.p = 15 and a = 2, 
it is a simple hypothesis; but if : p, = 15 and a > 15, it is a composite hypothesis, because the 
standard deviation does not have a specific value. 
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FIGURE 5.2 
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the alternative hypothesis is composite; actually it is what is known as a two-sided 
hypothesis. Very often such a two-sided alternative hypothesis reflects the fact that we do 
not have a strong a priori or theoretical expectation about the direction in which the 
alternative hypothesis should move from the null hypothesis. 

Is the observed 0 2 compatible with Hq! To answer this question, let us refer to the confi¬ 
dence interval in Eq. (5.3.9). We know that in the long run intervals like (0.5700, 0.8780) will 
contain the true fi 2 with 95 percent probability. Consequently, in the long run (i.e., repeated 
sampling) such intervals provide a range or limits within which the true 0 2 may lie with a con¬ 
fidence coefficient of, say, 95 percent. Thus, the confidence interval provides a set of plausible 
null hypotheses. Therefore, if fi 2 under //q falls within the 100(1 — a)% confidence interval, 
we do not reject the null hypothesis; if it lies outside the interval, we may reject it. 8 This range 
is illustrated schematically in Figure 5.2. 


Decision Rule Construct a 100(1 - a)% confidence interval for f} 2 ■ If the 0 2 under H 0 falls within this 
confidence interval, do not reject Hq, but if it falls outside this interval, reject Hq. 


Following this rule, for our hypothetical example, H 0 : fi 2 = 0.5 clearly lies outside the 
95 percent confidence interval given in Eq. (5.3.9). Therefore, we can reject the hypothesis 
that the true slope is 0.5, with 95 percent confidence. If the null hypothesis were true, the 
probability of our obtaining a value of slope of as much as 0.7240 by sheer chance or fluke 
is at the most about 5 percent, a small probability. 

In statistics, when we reject the null hypothesis, we say that our finding is statistically 
significant. On the other hand, when we do not reject the null hypothesis, we say that our 
finding is not statistically significant. 

Some authors use a phrase such as “highly statistically significant.” By this they usually 
mean that when they reject the null hypothesis, the probability of committing a Type I error 
(i.e., a) is a small number, usually 1 percent. But as our discussion of the p value in Sec¬ 
tion 5.8 will show, it is better to leave it to the researcher to decide whether a statistical find¬ 
ing is “significant,” “moderately significant,” or “highly significant.” 


8 Always bear in mind that there is a 100a percent chance that the confidence interval does not 
contain p 2 under H 0 even though the hypothesis is correct. In short, there is a 100a percent chance 
of committing a Type I error. Thus, if a = 0.05, there is a 5 percent chance that we could reject the 
null hypothesis even though it is true. 
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One-Sided or One-Tail Test 

Sometimes we have a strong a priori or theoretical expectation (or expectations based on 
some previous empirical work) that the alternative hypothesis is one-sided or unidirectional 
rather than two-sided, as just discussed. Thus, for our wages-education example, one could 
postulate that 

H 0 :p 2 < 0.5 and H r :p 2 > 0.5 

Perhaps economic theory or prior empirical work suggests that the slope is greater than 0.5. 
Although the procedure to test this hypothesis can be easily derived from Eq. (5.3.5), the ac¬ 
tual mechanics are better explained in terms of the test-of-significance approach discussed 
next. 9 


5.7 Hypothesis Testing: The Test-of-Significance Approach 


Testing the Significance of Regression Coefficients: The t Test 

An alternative but complementary approach to the confidence-interval method of testing 
statistical hypotheses is the test-of-significance approach developed along independent 
lines by R. A. Fisher and jointly by Neyman and Pearson. 10 Broadly speaking, a test of 
significance is a procedure by which sample results are used to verify the truth or falsity 
of a null hypothesis. The key idea behind tests of significance is that of a test statistic 
(estimator) and the sampling distribution of such a statistic under the null hypothesis. The 
decision to accept or reject H 0 is made on the basis of the value of the test statistic obtained 
from the data at hand. 

As an illustration, recall that under the normality assumption the variable 


se ((h) 

(Pi - 


(5.3.2) 


follows the t distribution with n — 2 df. If the value of true fi 2 is specified under the null hy¬ 
pothesis, the t value of Eq. (5.3.2) can readily be computed from the available sample, and 
therefore it can serve as a test statistic. And since this test statistic follows the t distribution, 
confidence-interval statements such as the following can be made: 


Pr [-4/2 < < 4/2 j = l-« (5.7.1) 

where P/ is the value of p 2 under Hq and where —4/2 and t a / 2 are the values of t (the 
critical t values) obtained from the t table for (a/2) level of significance and n — 2 df 
[cf. Eq. (5.3.4)]. The t table is given in Appendix D. 


9 lf you want to use the confidence interval approach, construct a (100 — a)% one-sided or one-tail 
confidence interval for fi 2 . Why? 

10 Details may be found in E. L. Lehman, Testing Statistical Hypotheses, John Wiley & Sons, New York, 
1959. 
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Rearranging Equation 5.7.1, we obtain 


Pr [p* - 4/2 se (ft) <ft < P 2 + 4/2 se (ft)] = 1 - « (5.7.2) 


which gives the interval in which ft will fall with 1 — a probability, given ft = P 2 . In the 
language of hypothesis testing, the 100(1 - a)% confidence interval established in Equa¬ 
tion 5.7.2 is known as the region of acceptance (of the null hypothesis) and the region(s) 
outside the confidence interval is (are) called the region(s) of rejection (of Hq) or the 
critical region(s). As noted previously, the confidence limits, the endpoints of the confi¬ 
dence interval, are also called critical values. 

The intimate connection between the confidence-interval and test-of-significance 
approaches to hypothesis testing can now be seen by comparing Eq. (5.3.5) with Eq. (5.7.2). 
In the confidence-interval procedure we try to establish a range or an interval that has a cer¬ 
tain probability of including the true but unknown ft, whereas in the test-of-significance 
approach we hypothesize some value for ft and try to see whether the computed ft lies 
within reasonable (confidence) limits around the hypothesized value. 

Once again let us return to our wages-education example. We know that ft = 0.7240, 
se (ft) — 0.0700, and df = 11. If we assume a = 5%, 4/ 2 = 2.201. 

If we assume H 0 : p 2 = p* — °- 5 an< ^ #i : ft ^ 0.5, Eq. (5.7.2) becomes 

Pr (0.3460 < ft < 0.6540) (5.7.3) 11 


as shown diagrammatically in Figure 5.3. 

In practice, there is no need to estimate Eq. (5.7.2) explicitly. One can compute the 
t value in the middle of the double inequality given by Eq. (5.7.1) and see whether it lies 
between the critical t values or outside them. For our example, 


0.7240 - 0.5 
1 ~~ 0.0700 


(5.7.4) 


which clearly lies in the critical region of Figure 5.4. The conclusion remains the same; 
namely, we reject H 0 . 


FIGURE 5.3 

The 95% confidence 
interval for ft under 
the hypothesis that 
ft = 0.5. 


m 



"In Sec. 5.2, point 4, it was stated that we cannot say that the probability is 95 percent that the fixed 
interval (0.5700, 0.8780) includes the true ft. But we can make the probabilistic statement given in 
Eq. (5.7.3) because ft, being an estimator, is a random variable. 
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Notice that if the estimated fa{ = P 2) is equal to the hypothesized /3 2 , the t value in Equa¬ 
tion 5.7.4 will be zero. However, as the estimated value departs from the hypothesized P2 
value, f (that is, the absolute t value; note: t can be positive as well as negative) will be in¬ 
creasingly large. Therefore, a “large ”\t\ value will be evidence against the null hypothesis. Of 
course, we can always use the t table to determine whether a particular t value is large or small; 
the answer, as we know, depends on the degrees of freedom as well as on the probability of 
Type I error that we are willing to accept. If you take a look at the t table given in Appendix D 
(Table D.2), you will observe that for any given value of df the probability of obtaining an 
increasingly large \t\ value becomes progressively smaller. Thus, for 20 df the probability of 
obtaining a \t\ value of 1.725 or greater is 0.10 or 10 percent, but for the same df the probabil¬ 
ity of obtaining a \t\ value of 3.552 or greater is only 0.002 or 0.2 percent. 

Since we use the t distribution, the preceding testing procedure is called appropriately 
the t test. In the language of significance tests, a statistic is said to be statistically sig¬ 
nificant if the value of the test statistic lies in the critical region. In this case the null 
hypothesis is rejected. By the same token, a test is said to be statistically insignificant 
if the value of the test statistic lies in the acceptance region. In this situation, the null hy¬ 
pothesis is not rejected. In our example, the t test is significant and hence we reject the null 
hypothesis. 

Before concluding our discussion of hypothesis testing, note that the testing procedure 
just outlined is known as a two-sided, or two-tail, test-of-significance procedure in that we 
consider the two extreme tails of the relevant probability distribution, the rejection 
regions, and reject the null hypothesis if it lies in either tail. But this happens because our 
Hi was a two-sided composite hypothesis; / 0.5 means P2 is either greater than or less 
than 0.5. But suppose prior experience suggests to us that the slope is expected to be greater 
than 0.5. In this case we have: Hy. /t 2 < 0.5 and H\ : /t 2 > 0.5. Although H\ is still a com¬ 
posite hypothesis, it is now one-sided. To test this hypothesis, we use the one-tail test (the 
right tail), as shown in Figure 5.5. (See also the discussion in Section 5.6.) 

The test procedure is the same as before except that the upper confidence limit or criti¬ 
cal value now corresponds to t a = 1 0 5, that is, the 5 percent level. As Figure 5.5 shows, we 
need not consider the lower tail of the t distribution in this case. Whether one uses a two- or 
one-tail test of significance will depend upon how the alternative hypothesis is formulated, 
which, in turn, may depend upon some a priori considerations or prior empirical experi¬ 
ence. (But more on this in Section 5.8.) 

We can summarize the t test of significance approach to hypothesis testing as shown in 
Table 5.1. 
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FIGURE 5.5 

One-tail test of 
significance. 


TABLE 5.1 
The t Test of 
Significance: Decision 
Rules 


m 



Two-tail f$2 — f$2 $2 + l f l > t«/2,df 

Right-tail Pi<p2 02 > P2 t > 4,df 

Left-tail >82 > $2 02 < P 2 f < —f a ,df 


Notes: 0* is the hypothesized numerical value of ft- 
I/I means the absolute value of /. 

t a or t a / 2 means the critical t value at the a or a/2 level of significance. 

df: degrees of freedom, (« - 2) for the two-variable model, (« - 3) for the three-variable model, and so on. 
The same procedure holds to test hypotheses about fa. 


Testing the Significance of a 2 : The x 2 Test 

As another illustration of the test-of-significance methodology, consider the following 
variable: 

X 2 = («- 1)- 2 (5.4.1) 
er z 

which, as noted previously, follows the x 2 distribution with n — 2 df. For our example, 
a 2 — 0.8937 and df = 11. If we postulate that H 0 : a 2 = 0.6 versus H\.a 2 ± 0.6, Equa¬ 
tion 5.4. 1 provides the test statistic for H 0 . Substituting the appropriate values in Eq. (5.4.1), 
it can be found that under H 0 , x 2 = 16.3845. If we assume a = 5%, the critical x 2 values 
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TABLE 5.2 
A Summary of the 
X 2 Test 


Note: (To is the value of a 2 under the null hypothesis. The first subscript on y 2 in the last column is the level of significance, and 
the second subscript is the degrees of freedom. These are critical chi-square values. Note that dfis (« — 2) for the two-variable 
regression model, (n — 3) for the three-variable regression model, and so on. 


H 0 : The Null 
Hypothesis 

a 2 = (To 

Hi: The Alternative 
Hypothesis 

(T 2 > (To 

Critical Region: 
Reject H 0 If 

df (d 2 ) 2 

4 >x - df 

(T 2 = (To 

(T 2 < (To 

df(* 2 ) ; 2 

4 X(1 -“>' df 

<t 2 = (Tq 

(T 2 # (TO 

df(<? 2 ) , r 2 

4 x “ /2 - df 



0r < *(W2),df 


are3.81575 and21.9200. Since the computed/ 2 lies between these limits, the data support 
the null hypothesis and we do not rej ect it. (See Figure 5.1.) This test procedure is called the 
chi-square test of significance. The x 2 test of significance approach to hypothesis testing 
is summarized in Table 5.2. 


5.8 Hypothesis Testing: Some Practical Aspects 

The Meaning of "Accepting" or "Rejecting" a Hypothesis 

If, on the basis of a test of significance, say, the t test, we decide to “accept” the null 
hypothesis, all we are saying is that on the basis of the sample evidence we have no reason 
to reject it; we are not saying that the null hypothesis is true beyond any doubt. Why? To 
answer this, let us return to our wages-education example and assume that H 0 : fi 2 = 0.70. 
Now the estimated value of the slope is /§ 2 = 0.7241 with a se (/§ 2 ) = 0.0701. Then on the 
(0.7241 - 0.7) 

basis of the t test we find that t — - ^ -= 0.3438 , which is insignificant, say, at 

a = 5%. Therefore, we say “accept” H 0 . But now let us assume H 0 : fi 2 = 0.6. Applying 
(0.7241 - 0.6) 

the t test again, we obtain t— - ^ -= 1.7703, which is also statistically 

insignificant. So now we say “accept” this H 0 . Which of these two null hypotheses is the 
“truth”? We do not know. Therefore, in “accepting” a null hypothesis we should always be 
aware that another null hypothesis may be equally compatible with the data. It is therefore 
preferable to say that we may accept the null hypothesis rather than we (do) accept it. Better 
still, 


... just as a court pronounces a verdict as “not guilty” rather than “innocent,” so the conclu¬ 
sion of a statistical test is “do not reject” rather than “accept.” 12 


2 Jan Kmenta, Elements of Econometrics, Macmillan, New York, 1971, p. 114. 
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The "Zero" Null Hypothesis and the "2-t" Rule of Thumb 

A null hypothesis that is commonly tested in empirical work is Hq : fi 2 =0, that is, the slope 
coefficient is zero. This “zero” null hypothesis is a kind of straw man, the objective being 
to find out whether Y is related at all to X, the explanatory variable. If there is no relation¬ 
ship between Y and X to begin with, then testing a hypothesis such as fi 2 = 0.3 or any other 
value is meaningless. 

This null hypothesis can be easily tested by the confidence interval or the f-test approach 
discussed in the preceding sections. But very often such formal testing can be shortcut by 
adopting the “2-t” rule of significance, which may be stated as 

“2-t” Rule of 
Thumb 

If the number of degrees of freedom is 20 or more and if a, the level of significance, is set 
at 0.05, then the null hypothesis ft 2 = 0 can be rejected if the t value [ = /3 2 /se (y§ 2 )] com¬ 
puted from Eq. (5.3.2) exceeds 2 in absolute value. 


The rationale for this rule is not too difficult to grasp. From Eq. (5.7.1) we know that we 
will reject H 0 : f} 2 = 0 if 

t = f} 2 /se(j} 2 ) > 4/2 when p 2 > 0 

or 

t — fo/se (p 2 ) < -4/2 when fi 2 < 0 

or when 

" |= ( 5,8>i ) 

| se (f} 2 ) 1 

for the appropriate degrees of freedom. 

Now if we examine the t table given in Appendix D, we see that for df of about 20 or 
more a computed t value in excess of 2 (in absolute terms), say, 2.1, is statistically signifi¬ 
cant at the 5 percent level, implying rejection of the null hypothesis. Therefore, if we find 
that for 20 or more df the computed t value is, say, 2.5 or 3, we do not even have to refer to 
the t table to assess the significance of the estimated slope coefficient. Of course, one can 
always refer to the t table to obtain the precise level of significance, and one should always 
do so when the df are fewer than, say, 20. 

In passing, note that if we are testing the one-sided hypothesis fi 2 = 0 versus p 2 > 0 or 
fi 2 < 0, then we should reject the null hypothesis if 

M- Ny >4 (5.8.2) 

| se (/3 2 ) | 

If we fix a at 0.05, then from the t table we observe that for 20 or more df a t value in excess 
of 1.73 is statistically significant at the 5 percent level of significance (one-tail). Hence, 
whenever a t value exceeds, say, 1.8 (in absolute terms) and the df are 20 or more, one need 
not consult the t table for the statistical significance of the observed coefficient. Of course, 
if we choose a at 0.01 or any other level, we will have to decide on the appropriate t value 
as the benchmark value. But by now the reader should be able to do that. 
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Forming the Null and Alternative Hypotheses 13 

Given the null and the alternative hypotheses, testing them for statistical significance 
should no longer be a mystery. But how does one formulate these hypotheses? There are no 
hard-and-fast rules. Very often the phenomenon under study will suggest the nature of the 
null and alternative hypotheses. For example, consider the capital market line (CML) of 
portfolio theory, which postulates that E t = fi\ + Pier ,, where E — expected return on 
portfolio and a = the standard deviation of return, a measure of risk. Since return and risk 
are expected to be positively related—the higher the risk, the higher the return—the natural 
alternative hypothesis to the null hypothesis that = 0 would be Pi > 0. That is, one 
would not choose to consider values of Pi less than zero. 

But consider the case of the demand for money. As we shall show later, one of the 
important determinants of the demand for money is income. Prior studies of the money 
demand functions have shown that the income elasticity of demand for money (the percent 
change in the demand for money for a 1 percent change in income) has typically ranged 
between 0.7 and 1.3. Therefore, in a new study of demand for money, if one postulates that 
the income-elasticity coefficient f} 2 is 1, the alternative hypothesis could be that p 2 f \,'d 
two-sided alternative hypothesis. 

Thus, theoretical expectations or prior empirical work or both can be relied upon to 
formulate hypotheses. But no matter how the hypotheses are formed, it is extremely impor¬ 
tant that the researcher establish these hypotheses before carrying out the empirical investi¬ 
gation. Otherwise, he or she will be guilty of circular reasoning or self-fulfilling prophesies. 
That is, if one were to formulate hypotheses after examining the empirical results, there may 
be the temptation to form hypotheses that justify one’s results. Such a practice should be 
avoided at all costs, at least for the sake of scientific objectivity. Keep in mind the Stigler 
quotation given at the beginning of this chapter! 


Choosing a, the Level of Significance 

It should be clear from the discussion so far that whether we reject or do not reject the null 
hypothesis depends critically on a, the level of significance or the probability of committing 
a Type I error —the probability of rejecting the true hypothesis. In Appendix A we discuss 
fully the nature of a Type I error, its relationship to a Type II error (the probability of 
accepting the false hypothesis) and why classical statistics generally concentrates on a 
Type I error. But even then, why is a commonly fixed at the 1,5, or, at the most, 10 percent 
levels? As a matter of fact, there is nothing sacrosanct about these values; any other values 
will do just as well. 

In an introductory book like this it is not possible to discuss in depth why one chooses the 
1, 5, or 10 percent levels of significance, for that will take us into the field of statistical 
decision making, a discipline unto itself. A brief summary, however, can be offered. As we 
discuss in Appendix A, for a given sample size, if we try to reduce a Type I error, a Type II 
error increases, and vice versa. That is, given the sample size, if we try to reduce the proba¬ 
bility of rejecting the true hypothesis, we at the same time increase the probability of ac¬ 
cepting the false hypothesis. So there is a trade-off involved between these two types of errors, 


13 Foran interesting discussion about formulating hypotheses, see J. Bradford De Long and Kevin 
Lang, "Are All Economic Hypotheses False?" Journal of Political Economy, vol. 100, no. 6, 1992, 
pp. 1257-1272. 
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given the sample size. Now the only way we can decide about the trade-off is to find out the 
relative costs of the two types of errors. Then, 

ff the error of rejecting the null hypothesis which is in fact true (Error Type 1) is costly relative 
to the error of not rejecting the null hypothesis which is in fact false (Error Type fl), it will be 
rational to set the probability of the first kind of error low. ff, on the other hand, the cost of 
making Error Type 1 is low relative to the cost of making Error Type If, it will pay to make the 
probability of the first kind of error high (thus making the probability of the second type of 
error low). 14 

Of course, the rub is that we rarely know the costs of making the two types of errors. Thus, 
applied econometricians generally follow the practice of setting the value of a at a 1 or a 5 
or at most a 10 percent level and choose a test statistic that would make the probability of 
committing a Type II error as small as possible. Since one minus the probability of com¬ 
mitting a Type II error is known as the power of the test, this procedure amounts to maxi¬ 
mizing the power of the test. (See Appendix A for a discussion of the power of a test.) 

Fortunately, the dilemma of choosing the appropriate value of a can be avoided by using 
what is known as the p value of the test statistic, which is discussed next. 


The Exact Level of Significance: The p Value 

As just noted, the Achilles heel of the classical approach to hypothesis testing is its arbi¬ 
trariness in selecting a. Once a test statistic (e.g., the t statistic) is obtained in a given 
example, why not simply go to the appropriate statistical table and find out the actual prob¬ 
ability of obtaining a value of the test statistic as much as or greater than that obtained in 
the example? This probability is called the p value (i.e., probability value), also known as 
the observed or exact level of significance or the exact probability of committing a Type 
I error. More technically, the p value is defined as the lowest significance level at which 
a null hypothesis can be rejected. 

To illustrate, let us return to our wages-education example. Given the null hypothesis 
that the true coefficient of education is 0.5, we obtained a t value of 3.2 inEq. (5.7.4). What 
is the p value of obtaining a t value of as much as or greater than 3.2? Looking up the t table 
given in Appendix D, we observe that for 11 df the probability of obtaining such a t value 
must be smaller than 0.005 (one-tail) or 0.010 (two-tail). 

If you use Stata or EViews statistical packages, you will find that the p value of obtain¬ 
ing a lvalue of 3.2 or greater is about 0.00001, that is, extremely small. This is the p value 
of the observed t statistic. This exact level of significance of the t statistic is much smaller 
than the conventionally, and arbitrarily, fixed level of significance, such as 1, 5, or 10 per¬ 
cent. As a matter of fact, if we were to use the p value just computed, and reject the null 
hypothesis that the true coefficient of education is 0.5, the probability of our committing a 
Type I error would be only about 1 in 100,000! 

As we noted earlier, if the data do not support the null hypothesis, \t\ obtained under the 
null hypothesis will be “large” and therefore the p value of obtaining such a \t\ value will 
be “small.” In other words, for a given sample size, as |f| increases, the p value decreases, 
and one can therefore reject the null hypothesis with increasing confidence. 

What is the relationship of the p value to the level of significance a? If we make the habit 
of fixing a equal to the p value of a test statistic (e.g., the t statistic), then there is no conflict 
between the two values. To put it differently, it is better to give up fixing a arbitrarily at 


4 Jan Kmenta, Elements of Econometrics, Macmillan, New York, 1971, pp. 126-127. 
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some level and simply choose the p value of the test statistic. It is preferable to leave it 
to the reader to decide whether to reject the null hypothesis at the given p value. If in an 
application the p value of a test statistic happens to he, say, 0.145, or 14.5 percent, and if 
the reader wants to reject the null hypothesis at this (exact) level of significance, so be it. 
Nothing is wrong with taking a chance of being wrong 14.5 percent of the time if you reject 
the true null hypothesis. Similarly, as in our wages-education example, there is nothing 
wrong if the researcher wants to choose a p value of about 0.02 percent and not take a 
chance of being wrong more than 2 out of 10,000 times. After all, some investigators may 
be risk-lovers and some risk-averters! 

In the rest of this text, we will generally quote the p value of a given test statistic. Some 
readers may want to fix a at some level and reject the null hypothesis if the p value is less 
than a. That is their choice. 

Statistical Significance versus Practical Significance 

Look back at Example 3.1 and the regression results given in Equation (3.7.1). This regres¬ 
sion relates personal consumption expenditure (PCE) to gross domestic product (GDP) in 
the U.S. for the period 1960-2005, both variables being measured in 2000 billions of dollars. 

From this regression we see that the marginal propensity to consume (MPC), that is, the 
additional consumption as a result of an additional dollar of income (as measured by GDP) 
is about 0.72 or about 72 cents. Using the data in Eq. (3.7.1), the reader can verily that the 
95 percent confidence interval for the MPC is (0.7129,0.7306). (Note: Since there are 44 df 
in this problem, we do not have a precise critical t value for these df. Hence, you can use 
the 2-t rule of thumb to compute the 95 percent confidence interval.) 

Suppose someone maintains that the true MPC is 0.74. Is this number different from 
0.72? It is, if we strictly adhere to the confidence interval established above. 

But what is the practical or substantive significance of our finding? That is, what differ¬ 
ence does it make if we take the MPC to be 0.74 rather than 0.72? Is this difference of 0.02 
between the two MPCs that important practically? 

The answer to this question depends on what we plan to do with these estimates. For 
example, from macroeconomics we know that the income multiplier is 1 /(1 — MPC). Thus, 
if the MPC is 0.72, the multiplier is 3.57, but it is 3.84 if the MPC is 0.74. If the govern¬ 
ment were to increase its expenditure by $1 to lift the economy out of a recession, income 
would eventually increase by $3.57 if the MPC were 0.72, but it would increase by $3.84 if 
the MPC were 0.74. And that difference may or may not be crucial to resuscitating the 
economy. 

The point of all this discussion is that one should not confuse statistical significance 
with practical, or economic, significance. As Goldberger notes: 

When a null, say, fij = 1, is specified, the likely intent is that fij is close to 1, so close that for 
all practical purposes it may be treated as if it were 1. But whether 1.1 is “practically the same 
as” 1.0 is a matter of economics, not of statistics. One cannot resolve the matter by relying on 
a hypothesis test, because the test statistic [f = ] (bj — V)/Obj measures the estimated coeffi¬ 
cient in standard error units, which are not meaningful units in which to measure the economic 
parameter fij — 1. It may be a good idea to reserve the term “significance” for the statistical 
concept, adopting “substantial” for the economic concept. 15 


15 Arthur S. Goldberger, A Course in Econometrics, Harvard University Press, Cambridge, Massachusetts, 
1991, p. 240. Note bj is the OLS estimator of fij and abj is its standard error. For a corroborating 
view, see D. N. McCloskey, "The Loss Function Has Been Mislaid: The Rhetoric of Significance Tests," 
American Economic Review, vol. 75, 1985, pp. 201-205. See also D. N. McCloskey and S. T. Ziliak, 

"The Standard Error of Regression," Journal of Economic Literature, vol. 37, 1996, pp. 97-114. 
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The point made by Goldberger is important. As sample size becomes very large, issues 
of statistical significance become much less important but issues of economic significance 
become critical. Indeed, since with very large samples almost any null hypothesis will be 
rejected, there may be studies in which the magnitude of the point estimates may be 
the only issue. 

The Choice between Confidence-Interval and Test-of- 
Significance Approaches to Hypothesis Testing 

In most applied economic analyses, the null hypothesis is set up as a straw man and the 
objective of the empirical work is to knock it down, that is, reject the null hypothesis. Thus, 
in our consumption-income example, the null hypothesis that the MPC /S 2 = 0 is patently 
absurd, but we often use it to dramatize the empirical results. Apparently editors of reputed 
journals do not find it exciting to publish an empirical piece that does not reject the null 
hypothesis. Somehow the finding that the MPC is statistically different from zero is more 
newsworthy than the finding that it is equal to, say, 0.7! 

Thus, J. Bradford De Long and Kevin Lang argue that it is better for economists 

... to concentrate on the magnitudes of coefficients and to report confidence levels and not 
significance tests. If all or almost all null hypotheses are false, there is little point in concen¬ 
trating on whether or not an estimate is indistinguishable from its predicted value under the 
null. Instead, we wish to cast light on what models are good approximations, which requires 
that we know ranges of parameter values that are excluded by empirical estimates. 16 

In short, these authors prefer the confidence-interval approach to the test-of-significance 
approach. The reader may want to keep this advice in mind. 17 

5.9 Regression Analysis and Analysis of Variance 

In this section we study regression analysis from the point of view of the analysis of 
variance and introduce the reader to an illuminating and complementary way of looking at 
the statistical inference problem. 

In Chapter 3, Section 3.5, we developed the following identity: 

J2 y- = yf + J2 u\ = & xf + ( 3 . 5 . 2 ) 

that is, TSS = ESS + RSS, which decomposed the total sum of squares (TSS) into two 
components: explained sum of squares (ESS) and residual sum of squares (RSS). A study 
of these components of TSS is known as the analysis of variance (ANOVA) from the 
regression viewpoint. 

Associated with any sum of squares is its df, the number of independent observations on 
which it is based. TSS has n - 1 df because we lose 1 df in computing the sample mean Y. 
RSS has n — 2 df. (Why?) {Note: This is true only for the two-variable regression model 
with the intercept fi\ present.) ESS has 1 df (again true of the two-variable case only), 
which follows from the fact that ESS = is a function of /f 2 only, since J] xf is 

known. 


16 See their article cited in footnote 13, p. 1271. 

17 For a somewhat different perspective, see Carter Hill, William Griffiths, and George Judge, 
Undergraduate Econometrics, Wiley & Sons, New York, 2001, p. 108. 
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TABLE 5.3 

ANOVA Table for the 

Source of Variation 

SS* 

df 

MSS 1, 

Two-Variable 

Due to regression (ESS) 

£y? 

1 

$2ltf 

Regression Model 

Due to residuals (RSS) 

Y.QI 

n - 2 

EOf 

n — 2 


TSS 

Zyf 

n - 1 



*SS means sum of squares. 

+ Mcan sum of squares, which is obtained by dividing SS by their df. 


Let us arrange the various sums of squares and their associated df in Table 5.3, which is 
the standard form of the AOV table, sometimes called the ANOVA table. Given the entries 
of Table 5.3, we now consider the following variable: 


MSS of ESS 
MSS of RSS 

fe 2 E * 2 

E «?/(«- 2) 

& 2 E * 2 

a 1 


( 5 . 9 . 1 ) 


If we assume that the disturbances u, are normally distributed, which we do under the 
CNLRM, and if the null hypothesis (//<)) is that fi 2 = 0, then it can be shown that the F vari¬ 
able of Equation 5.9.1 follows the F distribution with 1 df in the numerator and (« — 2) df 
in the denominator. (See Appendix 5A, Section 5A.3, for the proof. The general properties 
of the F distribution are discussed in Appendix A.) 

What use can be made of the preceding F ratio? It can be shown 18 that 

E (ft E *?) =« 2 + Pi E x l ( 5 - 9 - 2 ) 

and 

E^E = E(a 2 ) = a 2 ( 5 . 9 . 3 ) 

(Note that fi 2 and a 2 appearing on the right sides of these equations are the true parame¬ 
ters.) Therefore, if fa is in fact zero, Equations 5.9.2 and 5.9.3 both provide us with identi¬ 
cal estimates of true er 2 . In this situation, the explanatory variable A has no linear influence 
on Y whatsoever and the entire variation in Y is explained by the random disturbances u,. 
If, on the other hand, fi 2 is not zero, Eqs. (5.9.2) and (5.9.3) will be different and part of the 
variation in Y will be ascribable to X. Therefore, the F ratio of Eq. (5.9.1) provides a test of 
the null hypothesis Hq\ (i 2 = 0. Since all the quantities entering into this equation can be 
obtained from the available sample, this F ratio provides a test statistic to test the null 
hypothesis that true fi 2 is zero. All that needs to be done is to compute the F ratio and 
compare it with the critical F value obtained from the F tables at the chosen level of 
significance, or obtain the p value of the computed F statistic. 

18 For proof, see K. A. Brownlee, Statistical Theory and Methodology in Science and Engineering, John 
Wiley & Sons, New York, 1960, pp. 278-280. 
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TABLE 5.4 

ANOVA Table for the 

Wages-Education 

Example 


Source of Variation 

SS 

df 

MSS 


Due to regression (ESS) 

95.4255 

1 

95.4255 

95.4255 

0.8811 

Due to residuals (RSS) 

9.6928 

11 

0.8811 

= 108.3026 

TSS 

105.1183 

12 




To illustrate, let us continue with our illustrative example. The ANOVA table for this ex¬ 
ample is as shown in Table 5.4. The computed F value is seen to be 108.3026. The p value 
of this F statistic corresponding to 1 and 11 df cannot be obtained from the Stable given in 
Appendix D, but by using electronic statistical tables it can be shown that the p value is 
0.0000001, an extremely small probability indeed. If you decide to choose the level-of- 
significance approach to hypothesis testing and fix a at 0.01, or a 1 percent level, you can 
see that the computed F of 108.3026 is obviously significant at this level. Therefore, if we 
reject the null hypothesis that = 0, the probability of committing a Type I error is very 
small. For all practical purposes, our sample could not have come from a population with 
zero #2 value and we can conclude with great confidence that X, education, does affect Y, 
average wages. 

Refer to Theorem 5.7 of Appendix 5 A. 1, which states that the square of the t value with 
k df is an F value with 1 df in the numerator and k df in the denominator. For our example, if 
we assume //o: /b = 0, then from Eq. (5.3.2) it can be easily verified that the estimated t 
value is 10.41. This t value has 11 df. Under the same null hypothesis, the F value was 
108.3026 with 1 and 11 df. Hence (10.3428) 2 = F value, except for the rounding errors. 

Thus, the t and the F tests provide us with two alternative but complementary ways of 
testing the null hypothesis that fa — 0. If this is the case, why not just rely on the t test and 
not worry about the F test and the accompanying analysis of variance? For the two-variable 
model there really is no need to resort to the F test. But when we consider the topic of 
multiple regression we will see that the F test has several interesting applications that make 
it a very useful and powerful method of testing statistical hypotheses. 


5.10 Application of Regression Analysis: 
The Problem of Prediction 


On the basis of the sample data of Table 3.2 we obtained the following sample regression: 

% = -0.0144 + 0.7240W, (3.6.1) 

where T, is the estimator of true E( 7,) corresponding to given X. What use can be made of 
this historical regression? One use is to “predict” or “forecast” the future mean wages Y 
corresponding to some given level of education X. Now there are two kinds of predictions: 
(1) prediction of the conditional mean value of Y corresponding to a chosen X, say, X 0 , that 
is the point on the population regression line itself (see Figure 2.2), and (2) prediction of 
an individual Y value corresponding to X 0 . We shall call these two predictions the mean 
prediction and individual prediction. 
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Mean Prediction 19 

To fix the ideas, assume that Xq = 20 and we want to predict E(Y \ Xq = 20). Now it can be 
shown that the historical regression in Eq. (3.6.1) provides the point estimate of this mean 
prediction as follows: 


% = Pi + P2X0 

= -0.0144+0.7240(20) (5.10.1) 

= 14.4656 

where Y 0 — estimator of E(Y \ X 0 ). It can be proved that this point predictor is a best linear 
unbiased estimator (BLUE). 

Since To is an estimator, it is likely to be different from its true value. The difference be¬ 
tween the two values will give some idea about the prediction or forecast error. To assess 
this error, we need to find out the sampling distribution of 7 0 . It is shown in Appendix 5 A, 
Section 5A.4, that 7 0 in Equation 5.10.1 is normally distributed with mean + PqXq) 
and the variance is given by the following formula: 


var(7 0 ) = a 2 


(Xq - X) 2 ' 

mm 


(5.10.2) 


By replacing the unknown a 2 by its unbiased estimator a 1 , we see that the variable 


Yp-iPi+PiXo) 

se(7 0 ) 


(5.10.3) 


follows the t distribution with n — 2 df. The t distribution can therefore be used to derive 
confidence intervals for the true E(Yo \ Xq) and test hypotheses about it in the usual man¬ 
ner, namely, 


Pr[/§i + #2^0 — t a /2 se(To) < Pi + PiX 0 < P\+ foXo + t a / 2 se(fo)] = 1 — a 


where se (To) is obtained from Eq. (5.10.2). 
For our data (see Table 3.2), 


(5.10.4) 


var (F 0 ) = 0.8936 


( 20 - 12) 2 1 
182 J 


= 0.3826 


and 


se(7 0 ) = 0.6185 

Therefore, the 95 percent confidence interval for true E(Y X 0 ) — + P 2 Xq is given by 

14.4656 - 2.201(.6185) < E(Y 0 \ X = 20) < 14.4656 + 2.20(0.6185) 


19 For the proofs of the various statements made, see App. 5A, Sec. 5A.4. 
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FIGURE 5.6 

Confidence intervals 
(bands) for mean Y and 
individual Y values. 


Y 



X 


13.1043 < E(Y\X = 20) < 15.8260 (5.10.5) 

Thus, given X Q — 100, in repeated sampling, 95 out of 100 intervals like Equation 5.10.5 
will include the true mean value; the single best estimate of the true mean value is of course 
the point estimate 14.4656. 

If we obtain 95 percent confidence intervals like Eq. (5.10.5) for each of the X values 
given in Table 3.2, we obtain what is known as the confidence interval, or confidence 
band, for the population regression function, which is shown in Figure 5.6. 

Individual Prediction 

If our interest lies in predicting an individual Y value, 7 0 , corresponding to a given X value, 
say, X 0 , then, as shown in Appendix 5, Section 5 A.4, a best linear unbiased estimator of To 
is also given by Eq. (5.10.1), but its variance is as follows: 

var(7 0 - f 0 ) = E[Y 0 - Y 0 f = rr 2 [l + I + (5.10.6) 


It can be shown further that To also follows the normal distribution with mean and variance 
given by Eqs. (5.10.1) and (5.10.6), respectively. Substituting a 2 for the unknown cr 2 , it 
follows that 


To-To 


‘■(Y 0 -Yo) 
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also follows the t distribution. Therefore, the t distribution can be used to draw inferences 
about the true To. Continuing with our example, we see that the point prediction of Yq is 
14.4656, the same as that of Yq, and its variance is 1.2357 (the reader should verify this cal¬ 
culation). Therefore, the 95 percent confidence interval for Y 0 corresponding to Xq = 100 is 
seen to be 


(12.0190 < Y 0 | X 0 = 20 < 16.9122) ( 5 . 10 . 7 ) 

Comparing this interval with Eq. (5.10.5), we see that the confidence interval for indi¬ 
vidual Yq is wider than that for the mean value of Yq. (Why?) Computing confidence inter¬ 
vals like Equation 5.10.7 conditional upon the X values given in Table 3.2, we obtain the 
95 percent confidence band for the individual Y values corresponding to these X values. 
This confidence band along with the confidence band for Yq associated with the samel’s is 
shown in Figure 5.6. 

Notice an important feature of the confidence bands shown in Figure 5.6. The width of 
these bands is smallest when Xq = X. (Why?) However, the width widens sharply as Xq 
moves away from X. (Why?) This change would suggest that the predictive ability of the 
historical sample regression line falls markedly as Xq departs progressively from X. There¬ 
fore, one should exercise great caution in “extrapolating” the historical regression 
line to predict E(Y | Wo) or Yq associated with a given W 0 that is far removed from the 
sample mean X. 


5.11 Reporting the Results of Regression Analysis 

There are various ways of reporting the results of regression analysis, but in this text we 
shall use the following format, employing the wages-education example of Chapter 3 as an 
illustration: 


-0.0144 + 

0.7240W, 


(0.9317) 

(0.0700) 

r 2 = 0.9065 

(-0.0154) 

(10.3428) 

df = 11 

(0.987) 

(0.000) 

Fin = 108.30 


In Equation 5.11.1 the figures in the first set of parentheses are the estimated standard 
errors of the regression coefficients, the figures in the second set are estimated t values 
computed from Eq. (5.3.2) under the null hypothesis that the true population value of each 
regression coefficient individually is zero (e.g., 10.3428 = q'qjqq ), and the figures in the 
third set are the estimated p values. Thus, for 11 df the probability of obtaining a t value of 
10.3428 or greater is 0.00009, which is practically zero. 

By presenting the p values of the estimated t coefficients, we can see at once the exact 
level of significance of each estimated t value. Thus, under the null hypothesis that the true 
population slope value is zero (i.e., that is, education has no effect on mean wages), the 
exact probability of obtaining a t value of 10.3428 or greater is practically zero. Recall that 
the smaller the p value, the smaller the probability of making a mistake if we reject the null 
hypothesis. 
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Earlier we showed the intimate connection between the F and t statistics, namely, 
F\ k = tf Under the null hypothesis that the true = 0, Eq. (5.11.1) shows that the F 
value is 108.30 (for 1 numerator and 11 denominator df) and the t value is about 10.34 
(11 df); as expected, the former value is the square of the latter value, except for the round¬ 
off errors. The ANOVA table for this problem has already been discussed. 


5.12 Evaluating the Results of Regression Analysis 


In Figure 1.4 of the Introduction we sketched the anatomy of econometric modeling. Now 
that we have presented the results of regression analysis of our wages-education example in 
Eq. (5.11.1), we would like to question the adequacy of the fitted model. How “good” is the 
fitted model? We need some criteria with which to answer this question. 

First, are the signs of the estimated coefficients in accordance with theoretical or prior 
expectations? A priori, in the wages-education example should be positive. In the pre¬ 
sent example it is. Second, if theory says that the relationship should be not only positive 
but also statistically significant, is this the case in the present application? As we discussed 
in Section 5.11, the education coefficient is not only positive but also statistically signifi¬ 
cantly different from zero; the p value of the estimated t value is extremely small. The 
comment about significance applies about the intercept coefficient. Third, how well does 
the regression model explain variation in our example? One can use r 2 to answer this 
question. In the present example r 2 is about 0.90, which is a very high value considering 
that r 2 can be at most 1. 

Thus, the model we have chosen for explaining mean wages seems quite good. But 
before we sign off, we would like to find out whether our model satisfies the assumptions 
of CNLRM. We will not look at the various assumptions now because the model is patently 
so simple. But there is one assumption that we would like to check, namely, the normality 
of the disturbance term, u t . Recall that the t and F tests used before require that the error 
term follow the normal distribution. Otherwise, the testing procedure will not be valid in 
small, or finite, samples. 

Normality Tests 

Although several tests of normality are discussed in the literature, we will consider just 
three: (1) histogram of residuals; (2) normal probability plot (NPP), a graphical device; and 
(3) the Jarque-Bera test. 

Histogram of Residuals 

A histogram of residuals is a simple graphic device that is used to learn something about 
the shape of the probability density function (PDF) of a random variable. On the horizon¬ 
tal axis, we divide the values of the variable of interest (e.g., OLS residuals) into suitable 
intervals, and in each class interval we erect rectangles equal in height to the number of 
observations (i.e., frequency) in that class interval. If you mentally superimpose the bell¬ 
shaped normal distribution curve on the histogram, you will get some idea as to whether 
normal (PDF) approximation may be appropriate. For the wages-education regression, the 
histogram of the residuals is as shown in Figure 5.7. 

This diagram shows that the residuals are not perfectly normally distributed; for a 
normally distributed variable the skewness (a measure of symmetry) should be zero and 
kurtosis (which measures how tall or squatty the normal distribution is) should be 3. 

But it is always a good practice to plot the histogram of residuals from any regression as 
a rough and ready method of testing for the normality assumption. 
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FIGURE 5.7 

Histogram of residuals 
for wages—education 
data. 


Histogram 

(Response is mean hourly wage) 



Residual 


Normal Probability Plot 

A comparatively simple graphical device to study the shape of the probability density func¬ 
tion (PDF) of a random variable is the normal probability plot (NPP), which makes use 
of normal probability paper, a specially designed graph paper. On the horizontal, or A, axis, 
we plot values of the variable of interest (say, OLS residuals, «,), and on the vertical, or 7, 
axis, we show the expected value of this variable if it were normally distributed. Therefore, 
if the variable is in fact from the normal population, the NPP will be approximately a 
straight line. The NPP of the residuals from our wages-education regression is shown in 
Figure 5.8, which is obtained from the MINITAB software package, version 15. As noted 
earlier, if the fitted line in the NPP is approximately a straight line, one can conclude that 
the variable of interest is normally distributed. In Figure 5.8, we see that residuals from our 
illustrative example are approximately normally distributed, because a straight line seems 
to fit the data reasonably well. 

MINITAB also produces the Anderson-Darling normality test, known as the A 2 
statistic. The underlying null hypothesis is that the variable under consideration is 
normally distributed. As Figure 5.8 shows, for our example, the computed^ 2 statistic is 
0.289. The p value of obtaining such a value of A 2 is 0.558, which is reasonably high. 
Therefore, we do not reject the hypothesis that the residuals from our illustrative example 
are normally distributed. Incidentally, Figure 5.8 shows the parameters of the (normal) 
distribution, the mean is approximately 0, and the standard deviation is about 0.8987. 

Jarque-Bera (JB) Test of Normality 20 

The JB test of normality is an asymptotic, or large-sample, test. It is also based on the OLS 
residuals. This test first computes the skewness and kurtosis (discussed in Appendix A) 
measures of the OLS residuals and uses the following test statistic: 


JB = n 


(K- 3) 2 1 
24 J 


(5.12.1) 


20 See C. M. Jarque and A. K. Bera, "A Test for Normality of Observations and Regression Residuals/ 1 
International Statistical Review, vol. 55, 1987, pp. 163-172. 
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FIGURE 5.8 

Residuals from wages- 
education regression. 


Probability Plot of RESI1 

Normal 



where n — sample size, S — skewness coefficient, and K = kurtosis coefficient. For a nor¬ 
mally distributed variable, S= 0 and K = 3. Therefore, the JB test of normality is a test of 
the joint hypothesis that S and K are 0 and 3, respectively. In that case the value of the JB 
statistic is expected to be 0. 

Under the null hypothesis that the residuals are normally distributed, Jarque and 
Bera showed that asymptotically (i. e., in large samples) the JB statistic given in Equa¬ 
tion (5.12.1) follows the chi-square distribution with 2 df.li the computed p value of the 
JB statistic in an application is sufficiently low, which will happen if the value of the statis¬ 
tic is very different from 0, one can reject the hypothesis that the residuals are normally 
distributed. But if the p value is reasonably high, which will happen if the value of the 
statistic is close to zero, we do not reject the normality assumption. 

For our example, the estimated JB statistic for our wages-education example is 0.8286. 
The null hypothesis that the residuals in the present example are normally distributed can¬ 
not be rejected, for the p value of obtaining a JB statistic as much as 0.8286 or greater is 
about 0.66 or 66 percent. This probability is quite high. Note that although our regression 
has 13 observations, these observations were obtained from a sample of 528 observations, 
which seems reasonably high. 


Other Tests of Model Adequacy 

Remember that the CNLRM makes many more assumptions than the normality of the error 
term. As we examine econometric theory further, we will consider several tests of model 
adequacy (see Chapter 13). Until then, keep in mind that our regression modeling is based 
on several simplifying assumptions that may not hold in each and every case. 
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A Concluding 
Example 


FIGURE 5.9 

Residuals from the 
food expenditure 
regression. 


Let us return to Example 3.2 about food expenditure in India. Using the data given in Equa¬ 
tion (3.7.2) and adopting the format of Equation (5.11.1), we obtain the following expen¬ 
diture equation: 

FoodExp /= 94.2087 + 0.4368 TotalExp, 

se = (50.8563) (0.0783) 

t= (1.8524) (5.5770) ( 5 . 12 . 2 ) 

p = (0.0695) (0.0000)* 

r 2 = 0.3698; df = 53 
f 1;53 = 31.1034 (p value = 0.0000)* 


where* denotes extremely small. 

First, let us interpret this regression. As expected, there is a positive relationship between 
expenditure on food and total expenditure. If total expenditure went up by a rupee, on 
average, expenditure on food increased by about 44 paise. If total expenditure were zero, 
the average expenditure on food would be about 94 rupees. Of course, this mechanical 
interpretation of the intercept may not make much economic sense. The r 2 value of about 
0.37 means that 37 percent of the variation in food expenditure is explained by total 
expenditure, a proxy for income. 

Suppose we want to test the null hypothesis that there is no relationship between food 
expenditure and total expenditure, that is, the true slope coefficient = 0. The estimated 
value of is 0.4368. If the null hypothesis were true, what is the probability of obtaining 
a value of 0.4368? Under the null hypothesis, we observe from Eq. (5.12.2) that the t value 
is 5.5770 and the p value of obtaining such a t value is practically zero. In other words, 
we can reject the null hypothesis resoundingly. But suppose the null hypothesis were that 
/?2 = 0.5. Now what? Using the t test we obtain: 

t = ° 4 ^yo,° 5 = —0.8071 
0.0783 


The probability of obtaining a |t| of 0.8071 is greater than 20 percent. Hence we do not 
reject the hypothesis that the true fe is 0.5. 

Notice that, under the null hypothesis, the true slope coefficient is zero, the F value is 
31.1034, as shown in Eq. (5.12.2). Under the same null hypothesis, we obtained a t value 
of 5.5770. If we square this value, we obtain 31.1029, which is about the same as the F 
value, again showing the close relationship between the t and the F statistic. (Note; The 
numerator df for the F statistic must be 1, which is the case here.) 

Using the estimated residuals from the regression, what can we say about the probabil¬ 
ity distribution of the error term? The information is given in Figure 5.9. As the figure shows, 
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A Concluding 
Example 

( Continued ) 


the residuals from the food expenditure regression seem to be symmetrically distributed. 
Application of the Jarque-Bera test shows that the ]B statistic is about 0.2576, and the prob¬ 
ability of obtaining such a statistic under the normality assumption is about 88 percent. 
Therefore, we do not reject the hypothesis that the error terms are normally distributed. But 
keep in mind that the sample size of 55 observations may not be large enough. 

We leave it to the reader to establish confidence intervals for the two regression 
coefficients as well as to obtain the normal probability plot and do mean and individual 
predictions. 


Summary and 
Conclusions 


1. Estimation and hypothesis testing constitute the two main branches of classical statistics. 
Having discussed the problem of estimation in Chapters 3 and 4, we have taken up the 
problem of hypothesis testing in this chapter. 

2. Hypothesis testing answers this question: Is a given finding compatible with a stated 
hypothesis or not? 

3. There are two mutually complementary approaches to answering the preceding 
question: confidence interval and test of significance. 

4. Underlying the confidence-interval approach is the concept of interval estimation. An 
interval estimator is an interval or range constructed in such a manner that it has a spec¬ 
ified probability of including within its limits the true value of the unknown parameter. 
The interval thus constructed is known as a confidence interval, which is often stated in 
percent form, such as 90 or 95 percent. The confidence interval provides a set of plausi¬ 
ble hypotheses about the value of the unknown parameter. If the null-hypothesized value 
lies in the confidence interval, the hypothesis is not rejected, whereas if it lies outside this 
interval, the null hypothesis can be rejected. 

5. In the significance test procedure, one develops a test statistic and examines its sam¬ 
pling distribution under the null hypothesis. The test statistic usually follows a well- 
defined probability distribution such as the normal, t, F, or chi-square. Once a test 
statistic (e.g., the t statistic) is computed from the data at hand, its p value can be easily 
obtained. The p value gives the exact probability of obtaining the estimated test statistic 
under the null hypothesis. If this p value is small, one can reject the null hypothesis, but 
if it is large one may not reject it. What constitutes a small or large p value is up to the 
investigator. In choosing the p value the investigator has to bear in mind the probabili¬ 
ties of committing Type I and Type II errors. 

6. In practice, one should be careful in fixing a, the probability of committing a Type I 
error, at arbitrary values such as 1, 5, or 10 percent. It is better to quote the p value of 
the test statistic. Also, the statistical significance of an estimate should not be confused 
with its practical significance. 

7. Of course, hypothesis testing presumes that the model chosen for empirical analysis is 
adequate in the sense that it does not violate one or more assumptions underlying the 
classical normal linear regression model. Therefore, tests of model adequacy should 
precede tests of hypothesis. This chapter introduced one such test, the normality test, to 
find out whether the error term follows the normal distribution. Since in small, or finite, 
samples, the t, F, and chi-square tests require the normality assumption, it is important 
that this assumption be checked formally. 

8. If the model is deemed practically adequate, it may be used for forecasting purposes. But 
in forecasting the future values of the regressand, one should not go too far out of the sam¬ 
ple range of the regressor values. Otherwise, forecasting errors can increase dramatically. 
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EXERCISES 


Questions 

5.1. State with reason whether the following statements are true, false, or uncertain. Be 
precise. 

a. The t test of significance discussed in this chapter requires that the sampling 
distributions of estimators fi\ and $2 follow the normal distribution. 

b. Even though the disturbance term in the CLRM is not normally distributed, the 
OLS estimators are still unbiased. 

c. If there is no intercept in the regression model, the estimated u,( — u t ) will not 
sum to zero. 

d. The p value and the size of a test statistic mean the same thing. 

e. In a regression model that contains the intercept, the sum of the residuals is 
always zero. 

f If a null hypothesis is not rejected, it is true. 

g. The higher the value of a 2 , the larger is the variance of P2 given in Eq. (3.3.1). 

h. The conditional and unconditional means of a random variable are the same things. 

i. In the two-variable PRF, if the slope coefficient is zero, the intercept is 
estimated by the sample mean F. 

j. The conditional variance, var (F, | X t ) — a 2 and the unconditional variance of Y, 
var (F) = a 2 , will he the same if Xhad no influence on F. 

5.2. Set up the ANOVA table in the manner of Table 5.4 for the regression model given 
in Eq. (3.7.2) and test the hypothesis that there is no relationship between food 
expenditure and total expenditure in India. 

5.3. Refer to the demand for cell phones regression given in Eq. (3.7.3). 

a. Is the estimated intercept coefficient significant at the 5 percent level of signifi¬ 
cance? What is the null hypothesis you are testing? 

b. Is the estimated slope coefficient significant at the 5 percent level? What is the 
underlying null hypothesis? 

c. Establish a 95 percent confidence for the true slope coefficient. 

d. What is the mean forecast value of cell phones demanded if the per capita 
income is $9,000? What is the 95 percent confidence interval for the forecast 
value? 

5.4. Let p 1 represent the true population coefficient of determination. Suppose you 
want to test the hypothesis that p 2 = 0. Verbally explain how you would test this 
hypothesis. Hint: Use Eq. (3.5.11). See also Exercise 5.7. 

5.5. What is known as the characteristic line of modem investment analysis is simply 
the regression line obtained from the following model: 

r it = a,- + + u, 

where r it = the rate of return on the zth security in time t 
r mt = the rate of return on the market portfolio in time t 
u t = stochastic disturbance term 

In this model $ is known as the beta coefficient of the zth security, a measure of 
market (or systematic) risk of a security.* 


*See Haim Levy and Marshall Sarnat, Portfolio and Investment Selection: Theory and Practice, Prentice 
Hall International, Englewood Cliffs, NJ, 1984, Chap. 12. 
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On the basis of240 monthly rates of return for the period 1956-1976, Fogler and 
Ganapathy obtained the following characteristic line for IBM stock in relation to 
the market portfolio index developed at the University of Chicago:* 

fu = 0-7264 + 1.0598r m , r 2 = 0.4710 

se = (0.3001) (0.0728) df= 238 

F'i 238 = 211.896 

a. A security whose beta coefficient is greater than one is said to be a volatile or 
aggressive security. Was IBM a volatile security in the time period under study? 

b. Is the intercept coefficient significantly different from zero? If it is, what is its 
practical meaning? 

5.6. Equation (5.3.5) can also be written as 

Pr [ft — fa/2se (ft) < ft < ft + 4/2se (ft)] = 1 — a 
That is, the weak inequality (<) can be replaced by the strong inequality (<). Why? 

5.7. R. A. Fisher has derived the sampling distribution of the correlation coefficient 
defined in Eq. (3.5.13). If it is assumed that the variables X and Y are jointly 
normally distributed, that is, if they come from a bivariate normal distribution (see 
Appendix 4A, Exercise 4.1), then under the assumption that the population corre¬ 
lation coefficient p is zero, it can be shown that t — r~Jn — 2/Vl — r 2 follows 
Student’s t distribution with n — 2 df.** Show that this t value is identical with the t 
value given in Eq. (5.3.2) under the null hypothesis that ft = 0. Hence establish 
that under the same null hypothesis F — t 2 . (See Section 5.9.) 

5.8. Consider the following regression output: 1 ' 

% = 0.2033 + 0.6560X, 
se = (0.0976) (0.1961) 

r 2 = 0.397 RSS = 0.0544 ESS = 0.0358 

where Y — labor force participation rate (LFPR) of women in 1972 and X = LFPR 
of women in 1968. The regression results were obtained from a sample of 19 cities 
in the United States. 

a. How do you interpret this regression? 

b. Test the hypothesis: Ho: ft = 1 against H\ : ft > 1. Which test do you use? And 
why? What are the underlying assumptions of the test(s) you use? 

c. Suppose that the LFPR in 1968 was 0.58 (or 58 percent). On the basis of the regres¬ 
sion results given above, what is the mean LFPR in 1972? Establish a 95 percent con¬ 
fidence interval for the mean prediction. 

d. How would you test the hypothesis that the error term in the population regression is 
normally distributed? Show the necessary calculations. 


*H. Russell Fogler and Sundaram Ganapathy, Financial Econometrics, Prentice Hall, Englewood Cliffs, 
NJ, 1982, p. 13. 

**lf p is in fact zero, Fisher has shown that r follows the same t distribution provided either X or Vis 
normally distributed. But if p is not equal to zero, both variables must be normally distributed. See R. 
L. Anderson and T. A. Bancroft, Statistical Theory in Research, McGraw-Hill, New York, 1952, 
pp. 87-88. 

Adapted from Samprit Chatterjee, Ali S. Hadi, and Bertram Price, Regression Analysis by Example, 

3d ed., Wiley Interscience, New York, 2000, pp. 46^t7. 
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TABLE 5.5 

Average Salary and 
Per Pupil Spending 

Observation 

Salary 

Spending 

Observation 

Salary 

Spending 

1 

19,583 

3346 

27 

22,795 

3366 

(dollars), 1985 

2 

20,263 

3114 

28 

21,570 

2920 

3 

20,325 

3554 

29 

22,080 

2980 

Source: National Education 

4 

26,800 

4642 

30 

22,250 

3731 

Association, as reported by 
Albuquerque Tribune, 

5 

29,470 

4669 

31 

20,940 

2853 

Nov. 7, 1986. 

6 

26,610 

4888 

32 

21,800 

2533 


7 

30,678 

5710 

33 

22,934 

2729 


8 

27,1 70 

5536 

34 

18,443 

2305 


9 

25,853 

4168 

35 

19,538 

2642 


10 

24,500 

3547 

36 

20,460 

3124 


11 

24,274 

3159 

37 

21,419 

2752 


12 

27,1 70 

3621 

38 

25,160 

3429 


13 

30,168 

3782 

39 

22,482 

3947 


14 

26,525 

4247 

40 

20,969 

2509 


15 

27,360 

3982 

41 

27,224 

5440 


16 

21,690 

3568 

42 

25,892 

4042 


17 

21,974 

3155 

43 

22,644 

3402 


18 

20,816 

3059 

44 

24,640 

2829 


19 

18,095 

2967 

45 

22,341 

2297 


20 

20,939 

3285 

46 

25,610 

2932 


21 

22,644 

3914 

47 

26,015 

3705 


22 

24,624 

4517 

48 

25,788 

4123 


23 

27,186 

4349 

49 

29,1 32 

3608 


24 

33,990 

5020 

50 

41,480 

8349 


25 

23,382 

3594 

51 

25,845 

3766 


26 

20,627 

2821 





Empirical Exercises 

5.9. Table 5.5 gives data on average public teacher pay (annual salary in dollars) and spend¬ 
ing on public schools per pupil (dollars) in 1985 for 50 states and the District of 
Columbia. 

To find out if there is any relationship between teacher’s pay and per pupil expendi¬ 
ture in public schools, the following model was suggested: Pay, = f5\ + Spend,- + 
Ui, where Pay stands for teacher’s salary and Spend stands for per pupil expenditure. 

a. Plot the data and eyeball a regression line. 

b. Suppose on the basis of (a) you decide to estimate the above regression model. 
Obtain the estimates of the parameters, their standard errors, r 2 , RSS, and ESS. 

c. Interpret the regression. Does it make economic sense? 

d. Establish a 95 percent confidence interval for fc . Would you reject the hypothesis 
that the true slope coefficient is 3.0? 

e. Obtain the mean and individual forecast value of Pay if per pupil spending is 
$5,000. Also establish 95 percent confidence intervals for the true mean and indi¬ 
vidual values of Pay for the given spending figure. 

f How would you test the assumption of the normality of the error term? Show the 
test(s) you use. 

5.10. Refer to Exercise 3.20 and set up the ANOVA tables and test the hypothesis that there 
is no relationship between productivity and real wage compensation. Do this for both 
the business and nonfarm business sectors. 
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5.11. Refer to Exercise 1.7. 

a. Plot the data with impressions on the vertical axis and advertising expenditure on 
the horizontal axis. What kind of relationship do you observe? 

b. Would it be appropriate to fit a bivariate linear regression model to the data? Why 
or why not? If not, what type of regression model will you fit the data to? Do we 
have the necessary tools to fit such a model? 

c. Suppose you do not plot the data and simply fit the bivariate regression model to 
the data. Obtain the usual regression output. Save the results for a later look at this 
problem. 

5.12. Refer to Exercise 1.1. 

a. Plot the U.S. Consumer Price Index (CPI) against the Canadian CPI. What does 
the plot show? 

b. Suppose you want to predict the U.S. CPI on the basis of the Canadian CPI. 
Develop a suitable model. 

c. Test the hypothesis that there is no relationship between the two CPIs. Use 
a = 5%. If you reject the null hypothesis, does that mean the Canadian CPI 
“causes” the U.S. CPI? Why or why not? 

5.13. Refer to Problem 3.22. 

a. Estimate the two regressions given there, obtaining standard errors and the other 
usual output. 

b. Test the hypothesis that the disturbances in the two regression models are 
normally distributed. 

c. In the gold price regression, test the hypothesis that = 1, that is, there is a one- 
to-one relationship between gold prices and CPI (i.e., gold is a perfect hedge). What 
is the p value of the estimated test statistic? 

d. Repeat step (c) for the NYSE Index regression. Is investment in the stock market 
a perfect hedge against inflation? What is the null hypothesis you are testing? 
What is its p value? 

e. Between gold and stock, which investment would you choose? What is the basis 
of your decision? 

5.14. Table 5.6 gives data on GNP and four definitions of the money stock for the United 
States for 1970-1983. Regressing GNP on the various definitions of money, we 
obtain the results shown in Table 5.7. 

The monetarists or quantity theorists maintain that nominal income (i.e., nominal 
GNP) is largely determined by changes in the quantity or the stock of money, although 
there is no consensus as to the “right” definition of money. Given the results in the 
preceding table, consider these questions: 

a. Which definition of money seems to be closely related to nominal GNP? 

b. Since the r 2 terms are uniformly high, does this fact mean that our choice for 
definition of money does not matter? 

c. If the Fed wants to control the money supply, which one of these money measures 
is a better target for that purpose? Can you tell from the regression results? 

5.15. Suppose the equation of an indifference curve between two goods is 

XiYi=h+hXi 

How would you estimate the parameters of this model? Apply the preceding model 
to the data in Table 5.8 and comment on your results. 
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TABLE 5.6 
GNP and Four 
Measures of Money 
Stock 


Money Stock Measure, $ billion 


GNP - 

Year $ billion Mt 

1970 992.70 216.6 

1971 1,077.6 230.8 

1972 1,185.9 252.0 

1973 1,326.4 265.9 

1974 1,434.2 277.6 

1975 1,549.2 291.2 

1976 1,718.0 310.4 

1977 1,918.3 335.4 

1978 2,163.9 363.1 

1979 2,417.8 389.1 

1980 2,631.7 414.9 

1981 2,957.8 441.9 

1982 3,069.3 480.5 

1983 3,304.8 525.4 


m 2 

m 3 

L 

628.2 

677.5 

816.3 

712.8 

776.2 

903.1 

805.2 

886.0 

1,023.0 

861.0 

985.0 

1,141.7 

908.5 

1,070.5 

1,249.3 

1,023.3 

1,174.2 

1,367.9 

1,163.6 

1,311.9 

1,516.6 

1,286.7 

1,472.9 

1,704.7 

1,389.1 

1,647.1 

1,910.6 

1,498.5 

1,804.8 

2,117.1 

1,632.6 

1,990.0 

2,326.2 

1,796.6 

2,238.2 

2,599.8 

1,965.4 

2,462.5 

2,870.8 

2,196.3 

2,710.4 

3,183.1 


Mi = Currency + Demand deposits + Travelers checks and other checkable deposits (OCDs). 

M2 = Mi + Overnight RPs and Eurodollars 4- MMMF (Money market mutual fund) balances 4- MMDAs (Money market 
deposit accounts) + Savings and small deposits. 

M3 = M 2 + Large time deposits + Term RPs + Institutional MMMF. 


TABLE 5.7 

GNP-Money Stock 

Regressions, 

1970-1983 


1) 

GNP, = 

-787.4723 + 8.0863 M u 

r 2 = 0.9912 

2) 

GNP, = 

(77.9664) (0.2197) 

-44.0626 + 1.5875 M 2t 

r 2 = 0.9905 

3) 

GNP, = 

(61.0134) (0.0448) 

159.1366 + 1.2034M 3 , 

r 2 = 0.9943 

4) 

GNP, = 

(42.9882) (0.0262) 

164.2071 + 1.0290 ft 

r 2 = 0.9938 



(44.7658) (0.0234) 



Note: The figures in j 




TABLE 5.8 


Consumption of good X: 1 2 3 4 5 

Consumption of good Y : 4 3.5 2.8 1.9 0.8 


5.16. Since 1986 the Economist has been publishing the Big Mac Index as a crude, and hi¬ 
larious, measure of whether international currencies are at their “correct” exchange 
rate, as judged by the theory of purchasing power parity (PPP). The PPP holds that 
a unit of currency should be able to buy the same bundle of goods in all countries. 
The proponents of PPP argue that, in the long run, currencies tend to move toward 
their PPP. The Economist uses McDonald’s Big Mac as a representative bundle and 
gives the information in Table 5.9. 

Consider the following regression model: 

Yi = + p 2 Xi + Ui 

where Y — actual exchange rate and X = implied PPP of the dollar. 
a. If the PPP holds, what values of ft and ft would you expect a priori? 












TABLE 5.9 
The Hamburger 
Standard 







Actual 

Under(-)/ 





Dollar 

Over(+) 


Big Mac Prices 

Implied 

Exchange 

Valuation 


In Local 

In 

PPP* of 

Rate, 

against the 


Currency 

Dollars 

the Dollar 

Jan 31st 

Dollar, % 

United States 1 ' 

$3.22 

3.22 




Argentina 

Peso 8.25 

2.65 

2.56 

3.11 

-18 

Australia 

A$3.45 

2.67 

1.07 

1.29 

-17 

Brazil 

Real 6.4 

3.01 

1.99 

2.13 

—6 

Britain 

£1.99 

3.90 

1.62* 

1.96* 

+21 

Canada 

C$3.63 

3.08 

1.13 

1.18 

-4 

Chile 

Peso 1,670 

3.07 

519 

544 

-5 

China 

Yuan 11.0 

1.41 

3.42 

7.77 

-56 

Colombia 

Peso 6,900 

3.06 

2,143 

2,254 

-5 

Costa Rica 

Colones 1,130 

2.18 

351 

519 

-32 

Czech Republic 

Koruna 52.1 

2.41 

16.2 

21.6 

-25 

Denmark 

DKr27.75 

4.84 

8.62 

5.74 

+50 

Egypt 

Pound 9.09 

1.60 

2.82 

5.70 

-50 

Estonia 

Kroon 30 

2.49 

9.32 

12.0 

-23 

Euro area 5 

€2.94 

3.82 

1.10** 

1.30** 

+19 

Honq Konq 

HK$12.0 

1.54 

3.73 

7.81 

-52 

Hungary 

Forint 590 

3.00 

183 

197 

-7 

Iceland 

Kronur 509 

7.44 

158 

68.4 

+131 

Indonesia 

Rupiah 15,900 

1.75 

4,938 

9,100 

-46 

Japan 

¥280 

2.31 

87.0 

121 

-28 

Latvia 

Lats 1.35 

2.52 

0.42 

0.54 

-22 

Lithuania 

Litas 6.50 

2.45 

2.02 

2.66 

-24 

Malaysia 

Ringgit 5.50 

1.57 

1.71 

3.50 

-51 

Mexico 

Peso 29.0 

2.66 

9.01 

10.9 

-17 

New Zealand 

NZ$4.60 

3.16 

1.43 

1.45 

-2 

Norway 

Kroner 41.5 

6.63 

12.9 

6.26 

+106 

Pakistan 

Rupee 140 

2.31 

43.5 

60.7 

-28 

Paraguay 

Guarani 10,000 

1.90 

3,106 

5,250 

-41 

Peru 

New Sol 9.50 

2.97 

2.95 

3.20 

-8 

Philippines 

Peso 85.0 

1.74 

26.4 

48.9 

-46 

Poland 

Zloty 6.90 

2.29 

2.14 

3.01 

-29 

Russia 

Rouble 49.0 

1.85 

15.2 

26.5 

-43 

Saudi Arabia 

Riyal 9.00 

2.40 

2.80 

3.75 

-25 

Singapore 

S$3.60 

2.34 

1.12 

1.54 

-27 

Slovakia 

Crown 57.98 

2.13 

18.0 

27.2 

-34 

South Africa 

Rand 15.5 

2.14 

4.81 

7.25 

-34 

South Korea 

Won 2,900 

3.08 

901 

942 

-4 

Sri Lanka 

Rupee 190 

1.75 

59.0 

109 

-46 

Sweden 

SKr32.0 

4.59 

9.94 

6.97 

+43 

Switzerland 

SFr6.30 

5.05 

1.96 

1.25 

+57 

Taiwan 

NT$75.0 

2.28 

23.3 

32.9 

-29 

Thailand 

Baht 62.0 

1.78 

19.3 

34.7 

-45 

Turkey 

Lire 4.55 

3.22 

1.41 

1.41 

nil 

UAE 

Dirhams 10.0 

2.72 

3.11 

3.67 

-15 

Ukraine 

Hryvnia 9.00 

1.71 

2.80 

5.27 

-47 

Uruguay 

Peso 55.0 

2.17 

17.1 

25.3 

-33 

Venezuela 

Bolivar 6,800 

1.58 

2,112 

4,307 

-51 


♦Purchasing power parity: lot 

t Average of New York, Chica 
* Dollars per pound. 
^Weighted average of prices i 


140 






Chapter 5 Two-Variable Regression: Interval Estimation and Hypothesis Testing 141 


b. Do the regression results support your expectation? What formal test do you use 
to test your hypothesis? 

c. Should the Economist continue to publish the Big Mac Index? Why or why not? 

5.17. Refer to the SAT data given in Exercise 2.16. Suppose you want to predict the male 
math (7) scores on the basis of the female math scores (X) by running the following 
regression: 

Yt = Pi+P 2 X t +u t 

a. Estimate the preceding model. 

b. From the estimated residuals, find out if the normality assumption can be 
sustained. 

c. Now test the hypothesis that fii = 1, that is, there is a one-to-one correspondence 
between male and female math scores. 

d. Set up the ANOVA table for this problem. 

5.18. Repeat the exercise in the preceding problem but let 7 and X denote the male and fe¬ 
male critical reading scores, respectively. 

5.19. Table 5.10 gives annual data on the Consumer Price Index (CPI) and the Wholesale 
Price Index (WPI), also called Producer Price Index (PPI), for the U.S. economy for 
the period 1980-2006. 


TABLE 5.10 
CPI and PPI, USA, 
1980-2006 

Source: Economic Report of the 
and B-65. 


PPI (Total 

CPI Total Finished Goods) 

1980 82.4 88.0 

1981 90.9 96.1 

1982 96.5 100.0 

1983 99.6 101.6 

1984 103.9 103.7 

1985 107.6 104.7 

1986 109.6 103.2 

1987 113.6 105.4 

1988 118.3 108.0 

1989 124.0 113.6 


1990 130.7 119.2 

1991 136.2 121.7 

1992 140.3 123.2 

1993 144.5 124.7 

1994 148.2 125.5 

1995 152.4 127.9 

1996 156.9 131.3 

1997 160.5 131.8 

1998 163.0 130.7 

1999 166.6 133.0 


2000 172.2 138.0 

2001 177.1 140.7 

2002 179.9 138.9 

2003 184.0 143.3 

2004 188.9 148.5 

2005 195.3 155.7 

2006 201.6 160.3 
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a. Plot the CPI on the vertical axis and the WPI on the horizontal axis. A priori, what 
kind of relationship do you expect between the two indexes? Why? 

b. Suppose you want to predict one of these indexes on the basis of the other index. 
Which will you use as the regressand and which as the regressor? Why? 

c. Run the regression you have decided in (b). Show the standard output. Test the 
hypothesis that there is a one-to-one relationship between the two indexes. 

d. From the residuals obtained from the regression in (c), can you entertain the 
hypothesis that the true error term is normally distributed? Show the tests you use. 

5.20. Table 5.11 provides data on the lung cancer mortality index (100 = average) and the 

smoking index (100 = average) for 25 occupational groups. 

a. Plot the cancer mortality index against the smoking index. What general pattern 
do you observe? 

b. Letting Y — cancer mortality index and X = smoking index, estimate a linear 
regression model and obtain the usual regression statistics. 

c. Test the hypothesis that smoking has no influence on lung cancer at a = 5%. 

d. Which are the risky occupations in terms of lung cancer mortality? Can you give 
some reasons why this might be so? 

e. Is there any way to bring occupation category explicitly into the regression 
analysis? 


TABLE 5.11 

Smoking and Lung 

Occupation 

Smoking 

Cancer 

Cancer 

Farmers, foresters, fishermen 

77 

84 

Source: http://Ub.stat. 
cmu.edu/ DASL/Datafiles/ 

Miners and quarrymen 

137 

116 

Gas, coke, and chemical makers 

117 

123 

SmokingandCancer.html. 

Glass and ceramic makers 

94 

128 


Furnace forge foundry workers 

116 

155 


Electrical and electronic workers 

102 

101 


Engineering and allied trades 

111 

118 


Wood workers 

93 

113 


Leather workers 

88 

104 


Textile workers 

102 

88 


Clothing workers 

91 

104 


Food, drink, and tobacco workers 

104 

129 


Paper and printing workers 

107 

86 


Makers of other products 

112 

96 


Construction workers 

113 

144 


Painters and decorators 

110 

139 


Drivers of engines, cranes, etc. 

125 

113 


Laborers not included elsewhere 

113 

146 


Transportation, and communication workers 

115 

128 


Warehousemen, store keepers, etc. 

105 

115 


Clerical workers 

87 

79 


Sales workers 

91 

85 


Service, sports, recreation workers 

100 

120 


Administrators and managers 

76 

60 


Artists and professional and technical workers 

66 

51 
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Appendix 5A 


5A.1 Probability Distributions Related 
to the Normal Distribution 


The t, chi-square (x 2 ), and F probability distributions, whose salient features are discussed in 
Appendix A, are intimately related to the normal distribution. Since we will make heavy use of these 
probability distributions in the following chapters, we summarize their relationship with the normal 
distribution in the following theorem; the proofs, which are beyond the scope of this book, can be 
found in the references. 1 

Theorem 5.1. If Zj, Z2 ,..., Z„ are normally and independently distributed random 
variables such that Z, ~ N(pti, of), then the sum Z = ^ k; Z;, where ki are constants not all 
zero, is also distributed normally with mean ki Mi and variance ^&?er 2 ; that is, 

Z ~ N(£ kipii, J2 Note: /x denotes the mean value. 

In short, linear combinations of normal variables are themselves normally distributed. For example, 
if Z\ and Z2 are normally and independently distributed as Z\ ~ ¥(10, 2) and Z2 ~ N (8, 1.5), 
then the linear combination Z = O.8Z1 + O.2Z2 is also normally distributed with mean = 0.8(10) + 
0.2(8) = 9.6 and variance = 0.64(2) + 0.04(1.5) = 1.34, that is, Z ~(9.6, 1.34). 

Theorem 5.2. If Zi, Z2,..., Z„ are normally distributed but are not independent, the sum 
Z = Y.kiZi , where ki are constants not all zero, is also normally distributed with mean 
^kiiii and variance E k^af +2 J^k t kj cov(Z,, Zj), i / j]. 

Thus, if Z\ ~ 1V(6, 2) and Z2 ~ N(l, 3) and cov(Zi, Z2) = 0.8, then the linear combination 
O.6Z1 + 0.4Z2 is also normally distributed with mean = 0.6(6) + 0.4(7) = 6.4 and variance = 
[0.36(2) + 0.16(3) + 2(0.6)(0.4)(0.8)] = 1.584. 

Theorem 5.3. If Z\, Zi,..., Z n are normally and independently distributed random 
variables such that each Z,- ~ N( 0, 1), that is, a standardized normal variable, then J] Z 2 = 

Z 2 + Z\ + ■ ■ ■ + Z 2 follows the chi-square distribution with n df. Symbolically, ~ Xn > 
where n denotes the degrees of freedom, df. 

In short, “the sum of the squares of independent standard normal variables has a chi-square 
distribution with degrees of freedom equal to the number of terms in the sum.” 2 

Theorem 5.4. If Zi,Z2, ...,Z„ are independently distributed random variables each 
following chi-square distribution with &,■ df, then the sum ^ Z,- = Z\ + Z2 + ■ ■ ■ + Z„ also 
follows a chi-square distribution with k = df. 

Thus, if Z\ and Zi are independent x 2 variables with df of k\ and ki, respectively, then 
Z = Z\ + Zi is also ax 2 variable with (k\ + ki) degrees of freedom. This is called the reproductive 
property of the x 2 distribution. 


'For proofs of the various theorems, see Alexander M. Mood, Franklin A. Graybill, and Duane C. Bose, 
Introduction to the Theory of Statistics, 3d ed., McGraw-Hill, New York, 1974, pp. 239-249. 

2 lbid., p. 243. 
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Theorem 5.5. If Z\ is a standardized normal variable [Z\ ~ N( 0, 1)] and another variable 
Z2 follows the chi-square distribution with k df and is independent of Z\, then the variable 
defined as 


Z\ Z\\fk Standard normal variable 

■JZxIsfk JZ 2 ^Independent chi-square variable/df 

follows Student’s t distribution with k df. Note: This distribution is discussed in Appendix A 
and is illustrated in Chapter 5. 

Incidentally, note that as k, the df, increases indefinitely (i.e., as k -*■ 00), the Student’s t distribu¬ 
tion approaches the standardized normal distribution. 3 As a matter of convention, the notation 4 
means Student’s t distribution or variable with k df. 

Theorem 5.6. If Z\ and Z 2 are independently distributed chi-square variables with k\ and 
k 2 df, respectively, then the variable 


has the F distribution with k\ and k 2 degrees of freedom, where k\ is known as the numerator 
degrees of freedom and k 2 the denominator degrees of freedom. 


Again as a matter of convention, the notation JF* u k 2 means an F variable with k\ and k 2 degrees of 
freedom, the df in the numerator being quoted first. 

In other words, Theorem 5.6 states that the F variable is simply the ratio of two independently dis¬ 
tributed chi-square variables divided by their respective degrees of freedom. 


Theorem 5.7. The square of (Student’s) t variable with k df has an F distribution with k\ — 
1 df in the numerator and k 2 = k df in the denominator. 4 That is, 

F\,k = 11 

Note that for this equality to hold, the numerator df of the F variable must be 1. Thus, 
#1,4 = I4 or #1,23 = t 23 and so on. 

As noted, we will see the practical utility of the preceding theorems as we progress. 


Theorem 5.8. For large denominator df, the numerator df times the F value is approximately 
equal to the chi-square value with the numerator df. Thus, 

Theorem 5.9. For sufficiently large df, the chi-square distribution can be approximated by 
the standard normal distribution as follows: 


Z - \ , 2x i -Jtk- 1 


N( 0,1) 


where k denotes df. 


3 For proof, see Henri Theil, Introduction to Econometrics, Prentice Hall, Englewood Cliffs, NJ, 1978, 
pp. 237-245. 

4 For proof, see Eqs. (5.3.2) and (5.9.1). 
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5A.2 Derivation of Equation (5.3.2) 


Let 



„ _ A -A _ 

(1) 

and 

‘ se(ft) 


Z 2 = (n- 2)°^ 

(2) 


Provided a is known, Z\ follows the standardized normal distribution; that is, Z\ ~ N(0, 1). 
(Why?) Z2 follows the y 2 distribution with (n — 2) df. 5 Furthermore, it can be shown that Z2 is dis¬ 
tributed independently of Z\ , 6 Therefore, by virtue of Theorem 5.5, the variable 




(3) 


follows the t distribution with n — 2 df. Substitution of Eqs. (1) and (2) into Eq. (3) gives Eq. (5.3.2). 


5A.3 Derivation of Equation (5.9.1) 


Equation (1) shows that Zi ~ N( 0, 1). Therefore, by Theorem 5.3, the preceding quantity 

ry 2 (A-P2) 2 Y.X? 
z i 

follows the x 2 distribution with 1 df. As noted in Section 5A.1, 

a 1 Vu? 

Z 2 = (« — 2) —j = Pf- 

also follows the y 2 distribution with n — 2 df. Moreover, as noted in Section 4.3, Z2 is distributed in¬ 
dependently of Z \. Then from Theorem 5.6, it follows that 

Z\l 1 _(P2-p2) 2 (j:xf) 

F 1 mm - 2 ) Dfif/(B-2) 

follows the F distribution with 1 and n — 2 df, respectively. Under the null hypothesis Hq: = 0, the 

preceding F ratio reduces to Eq. (5.9.1). 

5A.4 Derivations of Equations (5.10.2) and (5.10.6) 

Variance of Mean Prediction 

Given X t = X 0 , the true mean prediction E(Yq \ X 0 ) is given by 

E(Y 0 \X 0 ) = p l +p 2 X 0 (1) 

5 For proof, see Robert V. Hogg and Allen T. Craig, Introduction to Mathematical Statistics, 2d ed., 
Macmillan, New York, 1965, p. 144. 

6 For proof, see ). Johnston, Econometric Methods, 3d ed., McGraw-Hill, New York, 1984, pp. 181-182. 
(Knowledge of matrix algebra is required to follow the proof.) 











146 Part One Single-Equation Regression Models 


We estimate Eq. (1) from 


% = Pi+PiXg (2) 

Taking the expectation of Eq. (2), given Xg, we get 

E(%) = E{fi i) + E(p 2 )X 0 
= Pi + PlXg 

because P\ and p 2 are unbiased estimators. Therefore, 

E(Y Q ) = E(Yo\Xo) = p l + p 2 Xo (3) 

That is, To is an unbiased predictor of E(Y 0 \ Xg). 

Now using the property that var (a + b) = var (a) + var (b) + 2 cov (a, b), we obtain 


var(7 0 ) = tar (A) + var{ ft)4 + 2 cov (&&)** (4) 


Using the formulas for variances and covariance of P\ and f)i given in Eqs. (3.3.1), (3.3.3), and 
(3.3.9) and manipulating terms, we obtain 


var (7 0 ) = 


(X 0 -X) 2 ] 

I >, 2 J 


= (5.10.2) 


Variance of Individual Prediction 

We want to predict an individual Y corresponding to X — Xg ; that is, we want to obtain 


Yo= fit + PlXg + «o 


We predict this as 


The prediction error, To — To, is 


Yo — % = Pi + faXg + ug — (Pi + fcXo) 
= (/Si — Pi) + (#2 — Pi)Xg + uo 


(5) 

( 6 ) 


(7) 


Therefore, 


E(Y 0 - T 0 ) = *(fc - $ 0 + E(Pi - P 2 )X 0 - E(u 0 ) 
= 0 


because /Si, P2 are unbiased, Xg is a fixed number, and E(ug) is zero by assumption. 

Squaring Eq. (7) on both sides and taking expectations, we get var (To —To) = 
var(/Si) +Xovar(yS2) + 2Wocov(/Si, Pi) + var (wo). Using the variance and covariance formulas 
for /Si and P 2 given earlier, and noting that var(«o) = o' 2 , we obtain 


var (Tq — Yg) = 




= ( 5 . 10 . 6 ) 





Chapter 


Extensions of the 
Two-Variable Linear 
Regression Model 

Some aspects of linear regression analysis can be easily introduced within the framework 
of the two-variable linear regression model that we have been discussing so far. First we 
consider the case of regression through the origin, that is, a situation where the inter¬ 
cept term, is absent from the model. Then we consider the question of the units of 
measurement, that is, how the Y and X variables are measured and whether a change in the 
units of measurement affects the regression results. Finally, we consider the question of the 
functional form of the linear regression model. So far we have considered models that 
are linear in the parameters as well as in the variables. But recall that the regression theory 
developed in the previous chapters requires only that the parameters be linear; the variables 
may or may not enter linearly in the model. By considering models that are linear in the 
parameters but not necessarily in the variables, we show in this chapter how the two- 
variable models can deal with some interesting practical problems. 

Once the ideas introduced in this chapter are grasped, their extension to multiple 
regression models is quite straightforward, as we shall show in Chapters 7 and 8. 

6.1 Regression through the Origin 

There are occasions when the two-variable population regression function (PRF) assumes 
the following form: 

(611) 

In this model the intercept term is absent or zero, hence the name regression through the 
origin. 

As an illustration, consider the capital asset pricing model (CAPM) of modem portfolio 
theory, which, in its risk-premium form, may be expressed as 1 

(ER, — r f ) = /J;(ERm — r f ) (6.1.2) 


'See Haim Levy and Marshall Sarnat, Portfolio and Investment Selection: Theory and Practice, Prentice- 
Hall International, Englewood Cliffs, NJ, 1984, Chap. 14. 
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where ER, = expected rate of return on security i 

ER m = expected rate of return on the market portfolio as represented by, say, the 
S&P 500 composite stock index 

rf = risk-free rate of return, say, the return on 90-day Treasury bills 
Pi — the Beta coefficient, a measure of systematic risk, i.e., risk that cannot be 
eliminated through diversification. Also, a measure of the extent to which 
the z'th security’s rate of return moves with the market. A /f, > 1 implies a 
volatile or aggressive security, whereas a /3,< 1 suggests a defensive secu¬ 
rity. {Note: Do not confuse this ft, with the slope coefficient of the two- 
variable regression, /f 2 .) 

If capital markets work efficiently, then CAPM postulates that security z’s expected risk 
premium (= ER, — rj) is equal to that security’s ft coefficient times the expected market 
risk premium (= ER,„ — rj). If the CAPM holds, we have the situation depicted in Fig¬ 
ure 6.1. The line shown in the figure is known as the security market line (SML). 

For empirical purposes, Equation 6.1.2 is often expressed as 

R i -r f = p i (R m -r f ) + u i (6.1.3) 

or 

R t -r f = on + Pi{R m - r f ) + Ui (6.1.4) 

The latter model is known as the Market Model. 2 If CAPM holds, a, is expected to be 
zero. (See Figure 6.2.) 

In passing, note that in Equation 6.1.4 the dependent variable, Y, is (R, - rj) and the 
explanatory variable, X, is Pi, the volatility coefficient, and not (R m — rj). Therefore, to run 
regression Eq. (6.1.4), one must first estimate Pi, which is usually derived from the 
characteristic line, as described in Exercise 5.5. (For further details, see Exercise 8.28.) 

As this example shows, sometimes the underlying theory dictates that the intercept 
term be absent from the model. Other instances where the zero-intercept model may be 
appropriate are Milton Friedman’s permanent income hypothesis, which states that perma¬ 
nent consumption is proportional to permanent income; cost analysis theory, where it is 

FIGURE 6.1 

Systematic risk. 


ER r r f 



2 See, for instance, Diana R. Harrington, Modern Portfolio Theory and the Capital Asset Pricing Model: A 
User's Guide, Prentice Hall, Englewood Cliffs, NJ, 1983, p. 71. 
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FIGURE 6.2 

The Market Model 
of Portfolio Theory 
(assuming a,- = 0). 


Ri~r f 



postulated that the variable cost of production is proportional to output; and some versions 
of monetarist theory that state that the rate of change of prices (i.e., the rate of inflation) is 
proportional to the rate of change of the money supply. 

How do we estimate models like Eq. (6.1.1), and what special problems do they pose? To 
answer these questions, let us first write the sample regression function (SRF) of Eq. (6.1.1), 
namely, 


Yi = frXi + in 

(6.1.5) 

Now applying the ordinary least squares (OLS) method to Eq. (6.1.5), we obtain the fol¬ 
lowing formulas for f} 2 and its variance (proofs are given in Appendix 6A, Section 6A.1): 

a Y,XiY t 

(6.1.6) 

var(ft) 

(6.1.7) 

where a 2 is estimated by 


n - 1 

(6.1.8) 


It is interesting to compare these formulas with those obtained when the intercept term is 
included in the model: 



(3.1.6) 


var (B 2 ) = 

(3.3.1) 


* 2 =B 

(3.3.5) 
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EXAMPLE 6.1 


The differences between the two sets of formulas should be obvious: In the model with the 
intercept term absent, we use raw sums of squares and cross products but in the intercept- 
present model, we use adjusted (from mean) sums of squares and cross products. Second, 
the df for computing a 1 is (n - 1) in the first case and (n - 2) in the second case. (Why?) 

Although the interceptless or zero intercept model may be appropriate on occasions, 
there are some features of this model that need to be noted. First, u ,, which is always 
zero for the model with the intercept term (the conventional model), need not be zero when 
that term is absent. In short, u, need not be zero for the regression through the origin. 
Second, r 2 , the coefficient of determination introduced in Chapter 3, which is always non¬ 
negative for the conventional model, can on occasions turn out to be negative for the inter¬ 
ceptless model! This anomalous result arises because the r 2 introduced in Chapter 3 
explicitly assumes that the intercept is included in the model. Therefore, the conventionally 
computed r 2 may not be appropriate for regression-through-the-origin models. 3 


r 2 for Regression-through-Origin Model 

As just noted, and as further discussed in Appendix 6A, Section 6A.1, the conventional r 2 
given in Chapter 3 is not appropriate for regressions that do not contain the intercept. But 
one can compute what is known as the raw r 2 for such models, which is defined as 

2 fcXiYif 


' E E Y l 


(6.1.9) 


Note: These are raw (i.e., not mean-corrected) sums of squares and cross products. 

Although this raw r 2 satisfies the relation 0 < r 2 < 1, it is not directly comparable to the 
conventional r 2 value. For this reason some authors do not report the r 2 value for zero 
intercept regression models. 

Because of these special features of this model, one needs to exercise great caution in 
using the zero intercept regression model. Unless there is very strong a priori expectation, 
one would be well advised to stick to the conventional, intercept-present model. This has a 
dual advantage. First, if the intercept term is included in the model but it turns out to be sta¬ 
tistically insignificant (i.e., statistically equal to zero), for all practical purposes we have a 
regression through the origin. 4 Second, and more important, if in fact there is an intercept 
in the model but we insist on fitting a regression through the origin, we would be commit¬ 
ting a specification error. We will discuss this more in Chapter 7. 


Table 6.1 gives data on excess returns Y t (%) on an index of 104 stocks in the sector of 

cyclical consumer goods and excess returns X t (%) on the overall stock market index for 
the U.K. for the monthly data for the period 1980-1999, for a total of 240 observations. 5 
Excess return refers to return in excess of return on a riskless asset (see the CAPM model). 


3 For additional discussion, see Dennis J. Aigner, Basic Econometrics, Prentice Hall, Englewood Cliffs, NJ, 
1971, pp. 85-88. 

4 Henri Theil points out that if the intercept is in fact absent, the slope coefficient may be estimated 
with far greater precision than with the intercept term left in. See his Introduction to Econometrics, 
Prentice Hall, Englewood Cliffs, NJ, 1978, p. 76. See also the numerical example given next. 

5 These data, originally obtained from DataStream databank, are reproduced from Christiaan Heij et al., 
Econometrics Methods with Applications in Business and Economics, Oxford University Press, Oxford, 
U.K., 2004. 




TABLE 6.1 


OBS Y X 


OBS Y 


X 


1980:01 6.08022852 

1980:02 -0.924185461 

1980:03 -3.286174252 

1980:04 5.211976571 

1980:05 -16.16421111 

1980:06 -1.054703649 

1980:07 11.17237699 

1980:08 -11.06327551 

1980:09 -16.77699609 

1980:10 -7.021834032 

1980:11 -9.71684668 

1980:12 5.215705717 

1981:01 -6.612000956 

1981:02 4.264498443 

1981:03 4.916710821 

1981:04 22.20495946 

1981:05 -11.29868524 

1981:06 -5.770507783 

1981:07 -5.217764717 

1981:08 16.19620175 

1981:09 -17.16995395 

1981:10 1.105334728 

1981:11 11.6853367 

1981:12 -2.301451728 

1982:01 8.643728679 

1982:02 -11.12907503 

1982:03 1.724627956 

1982:04 0.157879967 

1982:05 -1.875202616 

1982:06 -10.62481767 

1982:07 -5.761135416 

1982:08 5.481432596 

1982:09 -17.02207459 

1982:10 7.625420708 

1982:11 -6.575721646 

1982:12 -2.372829861 

1983:01 17.52374936 

1983:02 1.354655809 

1983:03 16.26861049 

1983:04 -6.074547158 

1983:05 -0.826650702 

1983:06 3.807881996 

1983:07 0.57570091 

1983:08 3.755563441 

1983:09 -5.365927271 

1983:10 -3.750302815 

1983:11 4.898751703 

1983:12 4.379256151 

1984:01 16.56016188 

1984:02 1.523127464 

1984:03 1.0206078 

1984:04 -3.899307684 

1984:05 -14.32501615 

1984:06 3.056627177 

1984:07 -0.02153592 

1984:08 3.355102212 

1984:09 0.100006778 

1984:10 1.691250318 

1984:11 8.20075301 


7.263448404 
6.339895504 
-9.285216834 
0.793290771 
-2.902420985 
8.613150875 
3.982062848 
-1.150170907 
3.486125868 
4.329850278 
0.936875279 
-5.202455846 
-2.082757509 
2.728522893 
0.653397106 
6.436071962 
-4.259197932 
0.543909707 
-0.486845933 
2.843999508 
-16.4572142 
4.4689381 71 
5.885519658 
-0.390698164 
2.499567896 
-4.033607075 
3.042525777 
0.734564665 
2.779732288 
-5.900116576 
3.005344385 
3.954990619 
2.547127067 
4.329008106 
0.191940594 
-0.92167555 
3.394682577 
0.758714353 
1.862073664 
6.797751341 
-1.699253628 
4.092592402 
-2.926299262 
1.773424306 
-2.800815667 
-1.505394995 
4.18696284 
1.201416981 
6.769320788 
-1.686027417 
5.245806105 
1.728710264 
-7.279075595 
-0.77947067 
-2.439634487 
8.445977813 
1.221080129 
2.733386772 
5.12753329 


1984:12 
1985:01 
1985:02 
1985:03 
1985:04 
1985:05 
1985:06 
1985:07 
1985:08 
1985:09 
1985:10 
1985:11 
1985:12 
1986:01 
1986:02 
1986:03 
1986:04 
1986:05 
1986:06 
1986:07 
1986:08 
1986:09 
1986:10 
1986:11 
1986:12 
1987:01 
1987:02 
1987:03 
1987:04 
1987:05 
1987:06 
1987:07 
1987:08 
1987:09 
1987:10 
1987:11 
1987:12 
1988:01 
1988:02 
1988:03 
1988:04 
1988:05 
1988:06 
1988:07 
1988:08 
1988:09 
1988:10 
1988:11 
1988:12 
1989:01 
1989:02 
1989:03 
1989:04 
1989:05 
1989:06 
1989:07 
1989:08 
1989:09 
1989:10 


3.52786616 
4.554587707 
5.365478677 
4.525231564 
2.944654344 
-0.268599528 
-3.661040481 
-4.540505062 
9.195292816 
-1.894817019 
12.00661274 
1.233987382 
-1.446329607 
6.023618851 
10.51235756 
13.40071024 
-7.796262998 
0.211540446 
6.471111064 
-9.037475168 
-5.47838091 
-6.756881852 
-2.564960223 
2.456599468 
1.476421303 
17.0694004 
7.565726727 
-3.239325817 
3.662578335 
7.157455113 
4.774901623 
4.23770166 
-0.881352219 
11.49688416 
-35.56617624 
-14.59137369 
14.87271664 
1.748599294 
-0.606016446 
-6.078095523 
3.976153828 
-1.050910058 
3.317856956 
0.407100105 
-11.87932524 
-8.801026046 
6.784211277 
-10.20578119 
-6.73805381 
12.83903643 
3.302860922 
-0.155918301 
3.623090767 
-1.167680873 
-1.221603303 
5.262902744 
4.845013219 
-5.069564838 
-13.57963526 


3.191554763 
3.907838688 
-1.708567484 
0.435218492 
0.958067845 
1.095477375 
-6.816108909 
2.785054354 
3.900209023 
-4.203004414 
5.60179802 
1.570093976 
-1.084427121 
0.778669473 
6.470651262 
8.953781192 
-2.387761685 
-2.873838588 
3.440269098 
-5.891053375 
6.375582004 
-5.734839396 
3.63088408 
-1.31606687 
3.521601216 
8.673412896 
6.914361923 
-0.460660854 
4.295976077 
7.719692529 
3.039887622 
2.510223804 
-3.039443563 
3.787092018 
-27.86969311 
-9.956367094 
7.975865948 
3.936938398 
-0.32797064 
-2.161544202 
2.721 787842 
-0.514825422 
3.128796482 
0.181502075 
-7.892363786 
3.347081899 
3.158592144 
-4.816470363 
-0.008549997 
13.46098219 
-0.764474692 
2.298491097 
0.762074588 
-0.495796117 
1.206636013 
4.637026116 
2.680874116 
-5.303858035 
-7.210655599 
( Continued) 
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OBS 


OBS 


1989:11 1.100607603 

1989:12 4.925083189 

1990:01 -2.532068851 

1990:02 -6.601872876 

1990:03 -1.023768943 

1990:04 -7.097917266 

1990:05 6.376626925 

1990:06 1.861974711 

1990:07 -5.591527585 

1990:08 -15.31758975 

1990:09 -10.17227358 

1990:10 -2.217396045 

1990:11 5.974205798 

1990:12 -0.857289036 

1991:01 -3.780184589 

1991:02 20.64721437 

1991:03 10.94068018 

1991:04 -3.145639589 

1991:05 -3.142887645 

1991:06 -1.960866141 

1991:07 7.330964031 

1991:08 7.854387926 

1991:09 2.539177843 

1991:10 -1.233244642 

1991:11 -11.7460404 

1991:12 1.078226286 

1992:01 5.937904622 

1992:02 4.113184542 

1992:03 -0.655199392 

1992:04 15.28430278 

1992:05 3.994517585 

1992:06 -11.94450998 

1992:07 -2.530701327 

1992:08 -9.842366221 

1992:09 18.11573724 

1992:10 0.200950206 

1992:11 1.125853097 

1992:12 7.639180786 

1993:01 2.919569408 

1993:02 -1.062404105 

1993:03 1.292641409 

1993:04 0.420241384 

1993:05 -2.514080553 

1993:06 0.419362276 

1993:07 4.374024535 

1993:08 1.733528075 

1993:09 -3.659808969 

1993:10 5.85690764 

1993:11 -1.365550294 

1993:12 -1.346979017 

1994:01 12.89578758 

1994:02 -5.346700561 

1994:03 -7.614726564 

1994:04 10.22042923 

1994:05 -6.928422261 

1994:06 -5.065919037 

1994:07 7.483498556 

1994:08 1.828762662 

1994:09 -5.69293279 

1994:10 -2.426962489 

1994:11 2.125100668 


5.350185944 
4.106245855 
-3.629547374 
-5.205804299 
-2.183244863 
-5.408563794 
10.57599169 
-0.338612099 
-2.21316202 
-8.476177427 
-7.45941471 
-0.085887763 
5.034770534 
-1.767714908 
0.189108456 
10.38741504 
2.921913827 
0.971720188 
-0.4317819 
-3.342924986 
5.242811509 
2.880654691 
-1.121472224 
-3.969577956 
-5.707995062 
1.502567049 
2.599565094 
0.135881087 
-6.146138064 
10.45736831 
1.415987046 
-8.261109424 
-3.778812167 
-5.386818488 
11.19436372 
3.999870038 
3.620674752 
2.887222251 
1.336746091 
1.240273846 
0.407144312 
-1.734930047 
1.111533687 
1.354127742 
1.943061568 
4.96197982 7 
-1.618729936 
4.215408608 
1.880360165 
5.826352413 
2.973540693 
-5.479858563 
-5.784547088 
1.157083438 
-6.356199493 
-0.843583888 
5.779953224 
3.298130184 
-7.110010085 
2.968005597 
-1.531245158 


1994:12 -4.225370964 

1995:01 -6.302392617 

1995:02 1.27867637 

1995:03 10.90890516 

1995:04 2.497849434 

1995:05 2.891526594 

1995:06 -3.773000069 

1995:07 8.776288715 

1995:08 2.88256097 

1995:09 2.14691333 

1995:10 -4.590104662 

1995:11 -1.293255187 

1995:12 -4.244101531 

1996:01 6.647088904 

1996:02 1.635900742 

1996:03 7.8581899 

1996:04 0.789544896 

1996:05 -0.907725397 

1996:06 -0.392246948 

1996:07 -1.035896351 

1996:08 2.556816005 

1996:09 3.131830038 

1996:10 -0.020947358 

1996:11 -5.312287782 

1996:12 -5.196176326 

1997:01 -0.753247124 

1997:02 -2.474343938 

1997:03 2.47647802 

1997:04 -1.119104196 

1997:05 3.352076269 

1997:06 -1.910172239 

1997:07 0.142814607 

1997:08 10.50199263 

1997:09 12.98501943 

1997:10 -4.134761655 

1997:11 -4.148579856 

1997:12 -1.752478236 

1998:01 -3.349121498 

1998:02 14.07471304 

1998:03 7.791650968 

1998:04 5.154679109 

1998:05 3.293686179 

1998:06 -13.25461802 

1998:07 -7.714205916 

1998:08 -15.26340483 

1998:09 -15.22865141 

1998:10 15.96218038 

1998:11 -8.684089113 

1998:12 17.13842369 

1999:01 -1.468448611 

1999:02 8.5036 

1999:03 10.8943073 

1999:04 13.03497394 

1999:05 -5.654671597 

1999:06 8.321969316 

1999:07 0.507652273 

1999:08 -5.022980561 

1999:09 -2.305448839 

1999:10 -1.876879466 

1999:11 1.348824769 

1999:12 -2.64164938 


0.264280259 
-2.420388431 
0.138795213 
3.231656585 
2.215804682 
3.856813589 
-0.952204306 
4.020036363 
1.423600345 
-0.037912571 
-1.17655329 
3.760277356 
0.434626357 
1.906345103 
0.301898961 
-0.314132324 
3.034331741 
-1.497346299 
-0.894676854 
-0.532816274 
3.863737088 
2.118254897 
-0.853553262 
1.770340939 
1.702551635 
3.465753348 
1.115253221 
-2.057818461 
3.57089955 
1.953480438 
2.458700404 
2.992341297 
-0.457968038 
8.111278967 
-6.967124504 
-0.155924791 
3.853283433 
7.379466014 
4.299097886 
3.410780517 
-0.081494993 
-1.613131159 
-0.397288954 
-2.237365283 
-12.4631993 
-5.170734985 
11.70544788 
-0.380200223 
4.986705187 
2.493727994 
0.937105259 
4.280082506 
3.960824402 
-4.499198079 
3.656745699 
-2.503971473 
-0.121901923 
-5.388032432 
4.010989716 
6.265312975 
4.045658427 
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EXAMPLE 6.1 

( Continued) 


First we fit model (6.1.3) to these data. Using EViews6 we obtained the following 
regression results, which are given in the standard EViews format. 

Dependent Variable: Y 
Method: Least Squares 
Sample: 1980M01 1999M12 
Included observations: 240 



Coefficient Std. Error 

t-Statistic 

Prob. 

X 

1.155512 0.074396 

15.53200 

0.0000 

R-squared 

Adjusted R-squared* 
S.E. of regression 

Sum squared resid. 

0.500309 Mean dependent var. 

0.500309 S.D. dependent var. 

5.548786 Durbin-Watson stat.* 

7358.578 


0.499826 

7.849594 

1.972853 

*We will discuss this statistic in 
*See Chapter 7. 

Chapter 12. 



As these results show, the slope coefficient, which is the Beta coefficient, is highly significant, 
for its p value is extremely small. The interpretation here is that if the excess market rate goes 
up by 1 percentage point, the excess return on the index of consumer goods sector goes up 
by about 1.15 percentage points. Not only is the slope coefficient statistically significant, but 
it is significantly greater than 1 (can you verify this?). If a Beta coefficient is greater than 1, 
such a security (here a portfolio of 104 stocks) is said to be volatile; it moves more than 
proportionately with the overall stock market index. But this finding should not be surprising, 
for in this example we are considering stocks from the sector of cyclical consumer goods such 
as houshold durables, automobiles, textiles, and sports equipment. 

If we fit model (6.1.4), we obtain the following results: 

Dependent Variable: Y 

Method: Least Squares 

Sample: 1980M01 1999M12 

Included observations: 240 




Coefficient Std. Error 

f-Statistic 

Prob. 

C 

X 

-0.447481 0.362943 

1.171128 0.075386 

-1.232924 

15.53500 

0.2188 

0.0000 

R-squared 

Adjusted R-squared 

S.E. of regression 

Sum squared resid. 
^statistic 

0.503480 Mean dependent var. 
0.501394 S.D. dependent var. 
5.542759 Durbin-Watson stat. 
7311.877 Prob. (f-statistic) 
241.3363 


0.499826 

7.849594 

1.984746 
0.000000 


From these results we see that the intercept is not statistically different from zero, although 
the slope coefficient (the Beta coefficient) is highly statistically significant. This suggests 
that the regression-through-the-origin model fits the data well. Besides, statistically there is 
no difference in the value of the slope coefficient in the two models. Note that the standard 
error of the slope coefficient in the regression-through-the-origin model is slightly lower 
than the one in the intercept-present model, thus supporting Theil's argument given in 
footnote 4. Even then, the slope coefficient is statistically greater than 1, once again con¬ 
firming that returns on the stocks in the cyclical consumer goods sector are volatile. 

By the way, note that the r 2 value given for the regression-through-the-origin model 
should be taken with a grain of salt, for the traditional formula of r 2 is not applicable for such 
models. EViews, however, routinely presents the standard r 2 value even for such models. 
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6.2 Scaling and Units of Measurement 


To grasp the ideas developed in this section, consider the data given in Table 6.2, which 
refers to U.S. gross private domestic investment (GPDI) and gross domestic product (GDP), 
in billions as well as millions of (chained) 2000 dollars. 

Suppose in the regression of GPDI on GDP one researcher uses data in billions of dol¬ 
lars but another expresses data in millions of dollars. Will the regression results be the same 
in both cases? If not, which results should one use? In short, do the units in which the 
regressand and regressor(s) are measured make any difference in the regression results? If 
so, what is the sensible course to follow in choosing units of measurement for regression 
analysis? To answer these questions, let us proceed systematically. Let 

Yi = Pi+ foXi + in (6.2.1) 

where Y= GPDI andX= GDP. Define 

Y* = w\ Yi (6.2.2) 

X* = w 2 X t (6.2.3) 

where w\ and w 2 are constants, called the scale factors; wi may equal w 2 or be different. 

From Equations 6.2.2 and 6.2.3 it is clear that Y* and X* are rescaled Y, and X, . Thus, 
if Yi and X, are measured in billions of dollars and one wants to express them in millions 
of dollars, we will have Y* = 1000 Y t and X* = 1000 X t ; here w\ = w 2 — 1000. 

Now consider the regression using Y* and X* variables: 

Y* = ft+ p*X* + u* (6.2.4) 

where Y* = w\ Y t , X* = w 2 X t , and u* = wiu t . (Why?) 


TABLE 6.2 

Gross Private 

Year 

GPDIBL 

GPDIM 

GDPB 

GDPM 

Domestic Investment 

1990 

886.6 

886,600.0 

7,112.5 

7,112,500.0 

and GDP, United 

1991 

829.1 

829,100.0 

7,100.5 

7,100,500.0 

States, 1990-2005 

1992 

878.3 

878,300.0 

7,336.6 

7,336,600.0 

(Billions of chained 

1993 

953.5 

953,500.0 

7,532.7 

7,532,700.0 

[2000] dollars, except 

1994 

1,042.3 

1,042,300.0 

7,835.5 

7,835,500.0 

as noted; quarterly 

1995 

1,109.6 

1,109,600.0 

8,031.7 

8,031,700.0 

data at seasonally 

1996 

1,209.2 

1,209,200.0 

8,328.9 

8,328,900.0 

adjusted annual 

1997 

1,320.6 

1,320,600.0 

8,703.5 

8,703,500.0 

rates) 

1998 

1,455.0 

1,455,000.0 

9,066.9 

9,066,900.0 

1999 

1,576.3 

1,576,300.0 

9,470.3 

9,470,300.0 

of the President, 2007 ? 

2000 

1,679.0 

1,679,000.0 

9,817.0 

9,81 7,000.0 

Table B-2, p. 328. 

2001 

1,629.4 

1,629,400.0 

9,890.7 

9,890,700.0 


2002 

1,544.6 

1,544,600.0 

10,048.8 

10,048,800.0 


2003 

1,596.9 

1,596,900.0 

10,301.0 

10,301,000.0 


2004 

1,713.9 

1,713,900.0 

10,703.5 

10,703,500.0 


2005 

1,842.0 

1,842,000.0 

11,048.6 

11,048,600.0 


Note: GPDIBL = gross private domestic investment, billions of 2000 dollars. 
GPDIM = gross private domestic investments, millions of 2000 dollars. 
GDPB = gross domestic product, billions of 2000 dollars. 

GDPM = gross domestic product, millions of 2000 dollars. 






Chapter 6 Extensions of the Two-Variable Linear Regression Model 155 


We want to find out the relationships between the following pairs: 

1. Pi and /j* 

2. j} 2 and 

3. var (/fi) and var(jS*) 

4. var (fif) and var (/if) 

5. a 2 and a* 2 

6. r 2 y and r 2 , y . 


From least-squares theory we know (see Chapter 3) that 
Pi = Y- h* 
HxiVt 


fh = 

var(^i) = 

var (Pi) = 


E^, 2 

E*? 

nT.x 2 

a 2 

E4 

E« 2 


(6.2.5) 

( 6 . 2 . 6 ) 

(6.2.7) 

( 6 . 2 . 8 ) 

(6.2.9) 


Applying the OLS method to Equation 6.2.4, we obtain similarly 


P* = Y*~ P*X* 

(6.2.10) 

s* E**y* 

(6.2.11) 

v*2 

var (Pf) E, • o'* 2 

(6.2.12) 

n X 

O'* 2 

var (k) =18—r 

(6.2.13) 

z-*r 

~*2 Euf 

(n — 2) 

(6.2.14) 


From these results it is easy to establish relationships between the two sets of parameter 
estimates. All that one has to do is recall these definitional relationships: Y* = W| Y, (or 
y* = wiyi); X* = W2Xi (or x* = W2X,); u* = w\u t \ Y* = W| Y ; and X* = wjX. Making 
use of these definitions, the reader can easily verify that 


h = (-)& 

(6.2.15) 

\W 2 J 


P* = WiPi 

(6.2.16) 

~*2 ,,,2 -2 
a = w\a 

(6.2.17) 

var (/§*) = w\ var(ydi) 

(6.2.18) 
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EXAMPLE 6.2 

The Relationship 
between the 
GDPl and GDP, 
United States, 
1990-2005 


var(/S 2 *) = var(ft) (6.2.19) 



From the preceding results it should be clear that, given the regression results based on 
one scale of measurement, one can derive the results based on another scale of measure¬ 
ment once the scaling factors, the w’s, are known. In practice, though, one should choose 
the units of measurement sensibly; there is little point in carrying all those zeros in 
expressing numbers in millions or billions of dollars. 

From the results given in (6.2.15) through (6.2.20) one can easily derive some special 
cases. For instance, if w\ = W 2 , that is, the scaling factors are identical, the slope coefficient 
and its standard error remain unaffected in going from the (Fj, X]) to the ( Y*, X*) scale, 
which should be intuitively clear. However, the intercept and its standard error are both mul¬ 
tiplied by w i. But if the X scale is not changed (i.e., W 2 = 1) and the Y scale is changed by 
the factor w\, the slope as well as the intercept coefficients and their respective standard 
errors are all multiplied by the same wi factor. Finally, if the Y scale remains unchanged (i.e., 
w i = 1) but the X scale is changed by the factor W 2 , the slope coefficient and its standard 
error are multiplied by the factor (1 /wj) but the intercept coefficient and its standard error 
remain unaffected. 

It should, however, be noted that the transformation from the (Y,X) to the (7*, X*) scale 
does not affect the properties of the OLS estimators discussed in the preceding chapters. 


To substantiate the preceding theoretical results, let us return to the data given in 
Table 6.2 and examine the following results (numbers in parentheses are the estimated 
standard errors). 

Both GPDI and GDP in billions of dollars: 

GPDi t = -926.090 + 0.2535 GDP t 

se = (116.358) (0.0129) r 2 = 0.9648 (6.2.21) 

Both GPDI and GDP in millions of dollars: 

GPDi t = -926,090 + 0.2535 GDP f 

se = (116,358) (0.0129) r 2 = 0.9648 (6.2.22) 

Notice that the intercept as well as its standard error is 1000 times the corresponding val¬ 
ues in the regression (6.2.21) (note that W\ = 1000 in going from billions to millions of 
dollars), but the slope coefficient as well as its standard error is unchanged, in accordance 
with the theory. 

GPDI in billions of dollars and GDP in millions of dollars: 

GPDl t = -926.090 + 0.0002535 GDP t 

se= (116.358) (0.0000129) r 2 = 0.9648 (6.2.23) 

As expected, the slope coefficient as well as its standard error is 1 /1000 its value in 
Eq. (6.2.21), since only the X, or GDP, scale is changed. 

GPDI in millions of dollars and GDP in billions of dollars: 

GPDlt = -926,090 + 253.524 GDP t 

se = (116,358.7)(12.9465) r 2 = 0.9648 (6.2.24) 
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EXAMPLE 6.2 Again notice that both the intercept and the slope coefficients as well as their respective 

(, Continued) standard errors are 1000 times their values in Eq. (6.2.21), in accordance with our theo¬ 

retical results. 

Notice that in all the regressions presented above, the r 2 value remains the same, which 
is not surprising because the r 2 value is invariant to changes in the unit of measurement, 
as it is a pure, or dimensionless, number. 


A Word about Interpretation 

Since the slope coefficient fa is simply the rate of change, it is measured in the units of the 
ratio 


Units of the dependent variable 
Units of the explanatory variable 

Thus in regression (6.2.21) the interpretation of the slope coefficient 0.2535 is that 
if GDP changes by a unit, which is 1 billion dollars, GPDI on the average changes by 
0.2535 billion dollars. In regression (6.2.23) a unit change in GDP, which is 1 million 
dollars, leads on average to a 0.0002535 billion dollar change in GPDI. The two results are 
of course identical in the effects of GDP on GPDI; they are simply expressed in different 
units of measurement. 


6.3 Regression on Standardized Variables 


We saw in the previous section that the units in which the regressand and regressor(s) are 
expressed affect the interpretation of the regression coefficients. This can be avoided if 
we are willing to express the regressand and regressor(s) as standardized variables. A vari¬ 
able is said to he standardized if we subtract the mean value of the variable from its 
individual values and divide the difference by the standard deviation of that variable. 

Thus, in the regression of Y and X, if we redefine these variables as 


S Y 

Xi-X 


(6.3.1) 

(6.3.2) 


where Y = sample mean of Y, Sy = sample standard deviation of Y,X = sample mean 
of X, and S x is the sample standard deviation of X; the variables Y* and X* are called 

standardized variables. 

An interesting property of a standardized variable is that its mean value is always zero 
and its standard deviation is always 1. (For proof, see Appendix 6A, Section 6A.2.) 

As a result, it does not matter in what unit the regressand and regressor(s) are measured. 
Therefore, instead of running the standard (bivariate) regression: 

Y i =fii + faX,+u i (6.3.3) 

we could run regression on the standardized variables as 
Y* = ft + ftX* + u* 

= P* 2 X* + u* 


(6.3.4) 

(6.3.5) 
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since it is easy to show that, in the regression involving standardized regressand and 
regressor(s), the intercept term is always zero. 6 The regression coefficients of the standard¬ 
ized variables, denoted by ft* and , are known in the literature as the beta coefficients. 7 
Incidentally, notice that (6.3.5) is a regression through the origin. 

How do we interpret the beta coefficients? The interpretation is that if the (standardized) 
regressor increases by one standard deviation, on average, the (standardized) regressand 
increases by /if standard deviation units. Thus, unlike the traditional model in Eq. (6.3.3), we 
measure the effect not in terms of the original units in which Y and X are expressed, but in 
standard deviation units. 

To show the difference between Eqs. (6.3.3) and (6.3.5), let us return to the GPDI and 
GDP example discussed in the preceding section. The results of (6.2.21) discussed previ¬ 
ously are reproduced here for convenience. 


GPDI, = —926.090 + 0.2535 GDP, 

se = (116.358) (0.0129) r 1 = 0.9648 


(6.3.6) 


where GPDI and GDP are measured in billions of dollars. 

The results corresponding to Eq. (6.3.5) are as follows, where the starred variables are 
standardized variables: 


GPDI* = 0.9822 GDP* 
se = (0.0485) 


(6.3.7) 


We know how to interpret Eq. (6.3.6): If GDP goes up by a dollar, on average GPDI goes 
up by about 25 cents. How about Eq. (6.3.7)? Here the interpretation is that if the (stan¬ 
dardized) GDP increases by one standard deviation, on average, the (standardized) GPDI 
increases by about 0.98 standard deviations. 

What is the advantage of the standardized regression model over the traditional model? 
The advantage becomes more apparent if there is more than one regressor, a topic we 
will take up in Chapter 7. By standardizing all regressors, we put them on an equal basis 
and therefore can compare them directly. If the coefficient of a standardized regressor is 
larger than that of another standardized regressor appearing in that model, then the latter 
contributes more relatively to the explanation of the regressand than the former. In other 
words, we can use the beta coefficients as a measure of relative strength of the various 
regressors. But more on this in the next two chapters. 

Before we leave this topic, two points may be noted. First, for the standardized regres¬ 
sion in Eq. (6.3.7) we have not given the r 2 value because this is a regression through 
the origin for which the usual r 2 is not applicable, as pointed out in Section 6.1. Second, 
there is an interesting relationship between the fi coefficients of the conventional model 
and the beta coefficients. For the bivariate case, the relationship is as follows: 



(6.3.8) 


where S x = the sample standard deviation of the X regressor and S y = the sample standard 
deviation of the regressand. Therefore, we can crisscross between the fi and beta coefficients 


6 Recall from Eq. (3.1.7) that Intercept = Mean value of the dependent variable — Slope x Mean 
value of the regressor. But for the standardized variables the mean values of the dependent variable 
and the regressor are zero. Hence the intercept value is zero. 

7 Do not confuse these beta coefficients with the beta coefficients of finance theory. 
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if we know the (sample) standard deviation of the regressor and regressand. We will see in the 
next chapter that this relationship holds true in the multiple regression also. It is left as an 
exercise for the reader to verify Eq. (6.3.8) for our illustrative example. 

6.4 Functional Forms of Regression Models 

As noted in Chapter 2, this text is concerned primarily with models that are linear in the pa¬ 
rameters; they may or may not be linear in the variables. In the sections that follow we con¬ 
sider some commonly used regression models that may be nonlinear in the variables but are 
linear in the parameters or that can be made so by suitable transformations of the variables. 
In particular, we discuss the following regression models: 

1. The log-linear model 

2. Semilog models 

3. Reciprocal models 

4. The logarithmic reciprocal model 

We discuss the special features of each model, when they are appropriate, and how they are 
estimated. Each model is illustrated with suitable examples. 

6.5 How to Measure Elasticity; The Log-Linear Model 

Consider the following model, known as the exponential regression model: 

7, = P\Xf 1 e u ‘ (6.5.1) 

which may be expressed alternatively as * 7 8 

In Y t = In ft + ft In X t + u , (6.5.2) 

where In = natural log (i.e., log to the base e, and where e = 2.718). 9 
If we write Eq. (6.5.2) as 

\nY i =ot+p 2 \nX i +u i (6.5.3) 

where a — In ft, this model is linear in the parameters a and ft, linear in the logarithms of 
the variables Y and X, and can be estimated by OLS regression. Because of this linearity, 
such models are called log-log, double-log, or log-linear models. See Appendix 6A.3 for 
the properties of logarithms. 

If the assumptions of the classical linear regression model are fulfilled, the parameters 
of Eq. (6.5.3) can be estimated by the OLS method by letting 

Y* = a + ft A* + Ui (6.5.4) 

where Y* = In 7, and X* — In X ,. The OLS estimators a and ft obtained will be best lin¬ 
ear unbiased estimators of a and ft, respectively. 

8 Note these properties of the logarithms: (1) ln(/4B) = In A + In fi, (2) In (>\/S) = In A - In B, and 
(3) In (A k ) = k In A, assuming that A and B are positive, and where k is some constant. 

9 ln practice one may use common logarithms, that is, log to the base 10. The relationship between the 
natural log and common log is: ln c X = 2.3026 logio X. By convention, In means natural logarithm, and 
log means logarithm to the base 10; hence there is no need to write the subscripts e and 10 explicitly. 
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FIGURE 6.3 

Constant elasticity 
model. 




One attractive feature of the log-log model, which has made it popular in applied work, 
is that the slope coefficient ft measures the elasticity of Y with respect to X, that is, the per¬ 
centage change in Y for a given (small) percentage change inX 10 Thus, if Y represents the 
quantity of a commodity demanded and X its unit price, ft measures the price elasticity of 
demand, a parameter of considerable economic interest. If the relationship between quan¬ 
tity demanded and price is as shown in Figure 6.3a, the double-log transformation as shown 
in Figure 6.3 b will then give the estimate of the price elasticity (—ft). 

Two special features of the log-linear model may be noted: The model assumes that 
the elasticity coefficient between Y and X, ft, remains constant throughout (why?), hence 
the alternative name constant elasticity model. 1 1 In other words, as Figure 63b shows, the 
change in In Y per unit change in In X (i.e., the elasticity, ft) remains the same no matter at 
which In X we measure the elasticity. Another feature of the model is that although a and 
ft are unbiased estimates of a and ft, ft (the parameter entering the original model) when 
estimated as ft = antilog (a) is itself a biased estimator. In most practical problems, how¬ 
ever, the intercept term is of secondary importance, and one need not worry about obtain¬ 
ing its unbiased estimate. 12 


10 The elasticity coefficient, in calculus notation, is defined as (dY/Y)/{dX/ X) = [(dY/dX)(X/Y)]. 
Readers familiar with differential calculus will readily see that ft is in fact the elasticity coefficient. 

A technical note: The calculus-minded reader will note that d(ln X)/dX = 1 /X or c/(ln X) = dX/X, 
that is, for infinitesimally small changes (note the differential operator d) the change in In X is equal 
to the relative or proportional change in X. In practice, though, if the change in X is small, this rela¬ 
tionship can be written as: change in In X = relative change in X, where = means approximately. 
Thus, for small changes, 

(In Xt - In X t _i) = (X t - X t -i)/X t -i = relative change in X 

Incidentally, the reader should note these terms, which will occur frequently: (1) absolute change, 
(2) relative or proportional change, and (3) percentage change, or percent growth rate. 

Thus, (X t - X t _i) represents absolute change, (X t - X t _i)/X t _i = (X t /X t _i - 1) is relative or 
proportional change, and [(X t — X t -i)/X t _i]100 is the percentage change, or the growth rate. 

X t and Xt —i are, respectively, the current and previous values of the variable X. 

"A constant elasticity model will give a constant total revenue change for a given percentage change 
in price regardless of the absolute level of price. Readers should contrast this result with the elasticity 
conditions implied by a simple linear demand function, = ft + ftX/ + u,. However, a simple linear 
function gives a constant quantity change per unit change in price. Contrast this with what the log- 
linear model implies for a given dollar change in price. 

1 Concerning the nature of the bias and what can be done about it, see Arthur S. Goldberger, Topics 
in Regression Analysis, Macmillan, New York, 1978, p. 120. 
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In the two-variable model, the simplest way to decide whether the log-linear model fits 
the data is to plot the scattergram of In Y, against In X, and see if the scatter points lie 
approximately on a straight line, as in Figure 6.36. 

A cautionary note: The reader should be aware of the distinction between a percent 
change and a percentage point change. For example, the unemployment rate is often 
expressed in percent form, say, the unemployment rate of 6%. If this rate goes to 8%, we say 
that the percentage point change in the unemployment rate is 2, whereas the percent change 
in the unemployment rate is (8 — 6)/6, or about 33%. So be careful when you deal with 
percent and percentage point changes, for the two are very different concepts. 


EXAMPLE 6.3 

Expenditure 
on Durable 
Goods in 
Relation to 
Total Personal 
Consumption 
Expenditure 


Table 6.3 presents data on total personal consumption expenditure (PCEXP), expenditure on 
durable goods (EXPDUR), expenditure on nondurable goods (EXPNONDUR), and expendi¬ 
ture on services (EXPSERVICES), all measured in 2000 billions of dollars. 13 

Suppose we wish to find the elasticity of expenditure on durable goods with respect 
to total personal consumption expenditure. Plotting the log of expenditure on durable 
goods against the log of total personal consumption expenditure, you will see that 
the relationship between the two variables is linear. Hence, the double-log model may be 
appropriate. The regression results are as follows: 

irTEXDURf = -7.5417 + 1.6266 In PCEX f 

se = (0.7161) (0.0800) (6.5.5) 

f= (-10.5309)* (20.3152)* r 2 = 0.9695 
where * indicates that the p value is extremely small. 


TABLE 6.3 

Total Personal 

Year or quarter 

EXPSERVICES 

EXPDUR 

EXPNONDUR 

PCEXP 

Expenditure and 

2003-1 

4,143.3 

971.4 

2,072.5 

7,184.9 

Categories 

2003-11 

4,161.3 

1,009.8 

2,084.2 

7,249.3 

(Billions of chained 

2003-111 

4,190.7 

1,049.6 

2,123.0 

7,352.9 

[2000] dollars; 

2003-IV 

4,220.2 

1,051.4 

2,132.5 

7,394.3 

quarterly data at 

2004-1 

4,268.2 

1,067.0 

2,155.3 

7,479.8 

seasonally adjusted 

2004-11 

4,308.4 

1,071.4 

2,164.3 

7,534.4 

annual rates) 

2004-111 

4,341.5 

1,093.9 

2,184.0 

7,607.1 

Sources-Department of 

2004-IV 

4,377.4 

1,110.3 

2,213.1 

7,687.1 

Commerce, Bureau of 

2005-1 

4,395.3 

1,116.8 

2,241.5 

7,739.4 

Economic Analysis. 

2005-11 

4,420.0 

1,150.8 

2,268.4 

7,819.8 

of the President, 2007, 

2005-111 

4,454.5 

1,175.9 

2,287.6 

7,895.3 

Table B-17, p. 347. 

2005-IV 

4,476.7 

1,137.9 

2,309.6 

7,910.2 


2006-1 

4,494.5 

1,190.5 

2,342.8 

8,003.8 


2006-11 

4,535.4 

1,190.3 

2,351.1 

8,055.0 


2006-111 

4,566.6 

1,208.8 

2,360.1 

8,111.2 


Note: See Table B-2 for data for total personal consumption expenditures for 1959-1989. 
EXPSERVICES = expenditure on services, billions of 2000 dollars. 

EXPDUR = expenditure on durable goods, billions of2000 dollars. 
EXPNONDUR = expenditure on nondurable goods, billions of 2000 dollars. 

PCEXP = total personal consumption expenditure, billions of 2000 dollars. 


(Continued) 


13 Durable goods include motor vehicles and parts, furniture, and household equipment; nondurable 
goods include food, clothing, gasoline and oil, fuel oil and coal; and services include housing, elec¬ 
tricity and gas, transportation, and medical care. 
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EXAMPLE 6.3 As these results show, the elasticity of EXPDUR with respect to PCEX is about 1.63, sug- 

(, Continued) gesting that if total personal expenditure goes up by 1 percent, on average, the expendi¬ 

ture on durable goods goes up by about 1.63 percent. Thus, expenditure on durable goods 
is very responsive to changes in personal consumption expenditure. This is one reason why 
producers of durable goods keep a keen eye on changes in personal income and personal 
consumption expenditure. In Exercise 6.18, the reader is asked to carry out a similar exer¬ 
cise for nondurable goods expenditure. 


6.6 Semilog Models: Log-Lin and Lin-Log Models 

How to Measure the Growth Rate: 

The Log-Lin Model 

Economists, businesspeople, and governments are often interested in finding out the rate of 
growth of certain economic variables, such as population, GNP, money supply, employ¬ 
ment, productivity, and trade deficit. 

Suppose we want to find out the growth rate of personal consumption expenditure on 
services for the data given in Table 6.3. Let Y, denote real expenditure on services at time t 
and Yo the initial value of the expenditure on services (i.e., the value at the end of2002-IV). 
You may recall the following well-known compound interest formula from your introduc¬ 
tory course in economics. 

F, = y 0 (l+ry (6.6.1) 

where r is the compound (i.e., over time) rate of growth of Y. Taking the natural logarithm 
of Equation 6.6.1, we can write 


In Y t = In fo + f hi (1 + r) 

(6.6.2) 

Now letting 


P\ = In Y 0 

(6.6.3) 

p 2 =\n (1+r) 

(6.6.4) 

we can write Equation 6.6.2 as 


In Y t = p x + p 2 t 

(6.6.5) 

Adding the disturbance term to Equation 6.6.5, we obtain 14 


In 7, = Pi +p 2 t + u t 

(6.6.6) 


This model is like any other linear regression model in that the parameters P\ and p 2 are lin¬ 
ear. The only difference is that the regressand is the logarithm of Y and the regressor is 
“time,” which will take values of 1, 2, 3, etc. 

Models like Eq. (6.6.6) are called semilog models because only one variable (in this 
case the regressand) appears in the logarithmic form. For descriptive purposes a model in 
which the regressand is logarithmic will be called a log-lin model. Later we will consider 
a model in which the regressand is linear but the regressor(s) is logarithmic and call it a 
lin-log model. 


14 We add the error term because the compound interest formula will not hold exactly. Why we add 
the error after the logarithmic transformation is explained in Sec. 6.8. 
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Before we present the regression results, let us examine the properties of model (6.6.5). 
In this model the slope coefficient measures the constant proportional or relative change 
in Y for a given absolute change in the value of the regressor (in this case the variable t), 
that is, 15 


relative change in regressand 

P2 = , . . ,t (6.6.7) 

absolute change in regressor 

If we multiply the relative change in 7by 100, Equation 6.6.7 will then give the percentage 
change, or the growth rate, in Y for an absolute change in X, the regressor. That is, 100 times 
P2 gives the growth rate in 7; 100 times fii is known in the literature as the semielasticity of 7 
with respect to X. (Question: To get the elasticity, what will we have to do?) 16 


EXAMPLE 6.4 

The Rate of 
Growth 

Expenditure on 
Services 

Note: EXS stands for expenditure on services and * denotes that the p value is extremely 
small. 

The interpretation of Equation 6.6.8 is that over the quarterly period 2003-1 to 2006-111, 
expenditures on services increased at the (quarterly) rate of 0.705 percent. Roughly, this is 
equal to an annual growth rate of 2.82 percent. Since 8.3226 = log of EXS at the begin¬ 
ning of the study period, by taking its antilog we obtain 4115.96 (billion dollars) as the 
beginning value of EXS (i.e., the value at the beginning of 2003). The regression line 
obtained in Eq. (6.6.8) is sketched in Figure 6.4. 


To illustrate the growth model (6.6.6), consider the data on expenditure on services given 
in Table 6.3. The regression results over time (t) are as follows: 

inixS t — 8.3226 + 0.00705 1 

se = (0.0016) (0.00018) r 2 =0.9919 ( 6 . 6 . 8 ) 

(5201.625)* (39.1667)* 


0 2 4 6 8 10 12 14 16 

Time 


15 Using differential calculus one can show that fa = d(ln Y)/dX = (1 /Y)(dY/dX) = (dY/Y)/dX, 
which is nothing but Eq. (6.6.7). For small changes in Y and X this relation may be approximated by 
( 7 - 7 - 1 ) 77 -! 

(X t -X t _!) 


Note: Here, X = t. 

16 See Appendix 6A.4 for various growth formulas. 
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Instantaneous versus Compound Rate of Growth 

The coefficient of the trend variable in the growth model (6.6.6), p 2 , gives the instantaneous 
(at a point in time) rate of growth and not the compound (over a period of time) 
rate of growth. But the latter can be easily found from Eq. (6.6.4) by taking the antilog of 
the estimated p 2 and subtracting 1 from it and multiplying the difference by 100. Thus, for 
our illustrative example, the estimated slope coefficient is 0.00705. Therefore, 
[antilog(0.00705) — 1] = 0.00708 or 0.708 percent. Thus, in the illustrative example, the 
compound rate of growth on expenditure on services was about 0.708 percent per quarter, 
which is slightly higher than the instantaneous growth rate of0.705 percent. This is of course 
due to the compounding effect. 

Linear Trend Model 

Instead of estimating model (6.6.6), researchers sometimes estimate the following model: 

Y, = fa + p 2 t + u, (6.6.9) 


That is, instead of regressing the log of Y on time, they regress Y on time, where Y is the 
regressand under consideration. Such a model is called a linear trend model and the 
time variable t is known as the trend variable. If the slope coefficient in Equation 6.6.9 is 
positive, there is an upward trend in Y, whereas if it is negative, there is a downward 
trend in Y. 

For the expenditure on services data that we considered earlier, the results of fitting the 
linear trend model (6.6.9) are as follows: 


EXS; = 4111.545 + 30.674t 

t= (655.5628) (44.4671) 


( 6 . 6 . 10 ) 


In contrast to Eq. (6.6.8), the interpretation of Eq. (6.6.10) is as follows: Over the quarterly 
period 2003-1 to 2006-IU, on average, expenditure on services increased at the absolute 
(note: not relative) rate of about 30 billion dollars per quarter. That is, there was an upward 
trend in the expenditure on services. 

The choice between the growth rate model (6.6.8) and the linear trend model (6.6.10) 
will depend upon whether one is interested in the relative or absolute change in the expen¬ 
diture on services, although for comparative purposes it is the relative change that is gen¬ 
erally more relevant. In passing, observe that we cannot compare the r 2 values of models 
(6.6.8) and (6.6.10) because the regressands in the two models are different. We will show 
in Chapter 7 how one compares the r 2 ’s of models like (6.6.8) and (6.6.10). 


The Lin-Log Model 

Unlike the growth model just discussed, in which we were interested in finding the per¬ 
cent growth in Y for an absolute change in X, suppose we now want to find the absolute 
change in Y for a percent change in X. A model that can accomplish this purpose can be 
written as: 


Yi = p 1 +p 2 \nX i +u i 


( 6 . 6 . 11 ) 


For descriptive purposes we call such a model a lin-log model. 
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EXAMPLE 6.5 


Let us interpret the slope coefficient p 2 - xl As usual, 


Change in Y 
Change in In X 
Change in Y 
relative change in X 


The second step follows from the fact that a change in the log of a number is a relative 
change. 

Symbolically, we have 


h = 


AY 

AX/X 


( 6 . 6 . 12 ) 


where, as usual, A denotes a small change. Equation 6.6.12 can be written, equivalently, 
AY = p 2 (AX/X) (6.6.13) 


This equation states that the absolute change in Y{ — AT) is equal to slope times the rela¬ 
tive change in X. If the latter is multiplied by 100, then Eq. (6.6.13) gives the absolute 
change in Y for a percentage change in X. Thus, if ( AX/X) changes by 0.01 unit (or 1 per¬ 
cent), the absolute change in Y is 0.01 (>82); if in an application one finds that = 500, the 
absolute change in 7is (0.01)(500) = 5.0. Therefore, when regression (6.6.11) is estimated 
by OLS, do not forget to multiply the value of the estimated slope coefficient by 0.01, or, 
what amounts to the same thing, divide it by 100. If you do not keep this in mind, your in¬ 
terpretation in an application will be highly misleading. 

The practical question is: When is a lin-log model like Eq. (6.6.11) useful? An interest¬ 
ing application has been found in the so-called Engel expenditure models, named after the 
German statistician Ernst Engel, 1821-1896. (See Exercise 6.10.) Engel postulated that 
“the total expenditure that is devoted to food tends to increase in arithmetic progression as 
total expenditure increases in geometric progression.” 18 


As an illustration of the lin-log model, let us revisit our example on food expenditure in 
India, Example 3.2. There we fitted a linear-in-variables model as a first approximation. 
But if we plot the data we obtain the plot in Figure 6.5. As this figure suggests, food 
expenditure increases more slowly as total expenditure increases, perhaps giving credence 
to Engel's law. The results of fitting the lin-log model to the data are as follows: 

FotfEipi = -1283.912 + 257.2700 In TotalExp, 

f= (-4.3848)* (5.6625)* r 2 = 0.3769 (6.6.14) 

Note: * denotes an extremely small p value. 

( Continued ) 


17 Again, using differential calculus, we have 



Therefore, 

f>2= ^ =( 6 . 6 . 12 ) 

T 

18 See Chandan Mukherjee, Howard White, and Marc Wuyts, Econometrics and Data Analysis for Devel¬ 
oping Countries, Routledge, London, 1998, p. 158. This quote is attributed to H. Working, "Statistical 
Laws of Family Expenditure," journal of the American Statistical Association, vol. 38, 1943, pp. 43-56. 
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EXAMPLE 6.5 

0 Continued ) 


Interpreted in the manner described earlier, the slope coefficient of about 257 means 
that an increase in the total food expenditure of 1 percent, on average, leads to about 
2.57 rupees increase in the expenditure on food of the 55 families included in the sample. 
{Note: We have divided the estimated slope coefficient by 100.) 

Before proceeding further, note that if you want to compute the elasticity coefficient 
for the log-lin or lin-log models, you can do so from the definition of the elasticity coeffi¬ 
cient given before, namely, 

ci dYX 

E| ast |c ,ty=-- 

As a matter of fact, once the functional form of a model is known, one can compute elas¬ 
ticities by applying the preceding definition. (Table 6.6, given later, summarizes the elas¬ 
ticity coefficients for the various models.) 


FIGURE 6.5 
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It may be noted that sometimes logarithmic transformation is used to reduce 
heteroscedasticity as well as skewness. (See Chapter 11.) A common feature of many 
economic variables, is that they are positively skewed (e.g., size distribution of firms or 
distribution of income or wealth) and they are heteroscedastic. A logarithmic transforma¬ 
tion of such variables reduces both skewness and heteroscedasticity. That is why labor 
economists often use the logarithms of wages in the regression of wages on, say, schooling, 
as measured by years of education. 

6.7 Reciprocal Models 

Models of the following type are known as reciprocal models. 

Yi =h+h(j^+ u ‘ (6.7.1) 

Although this model is nonlinear in the variable X because it enters inversely or recipro¬ 
cally, the model is linear in and fi 2 and is therefore a linear regression model. 19 

This model has these features: As X increases indefinitely, the term fi 2 {\/X) appro¬ 
aches zero {note: [i 2 is a constant) and Y approaches the limiting or asymptotic value fi\. 


19 lf we let X* = (1 /X/), then Eq. (6.7.1) is linear in the parameters as well as the variables band X*. 
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FIGURE 6.6 

The reciprocal model: 



Therefore, models like (6.7.1) have built in them an asymptote or limit value that the de¬ 
pendent variable will take when the value of the X variable increases indefinitely. 20 Some 
likely shapes of the curve corresponding to Eq. (6.7.1) are shown in Figure 6.6. 


EXAMPLE 6.6 As an illustration of Figure 6.6a, consider the data given in Table 6.4. These are cross- 
sectional data for 64 countries on child mortality and a few other variables. For now, con¬ 
centrate on the variables child mortality (CM) and per capita GNP, which are plotted in 
Figure 6.7. 

As you can see, this figure resembles Figure 6.6a: As per capita GNP increases, one 
would expect child mortality to decrease because people can afford to spend more on 
health care, assuming all other factors remain constant. But the relationship is not a 
straight line one: As per capita GNP increases, initially there is a dramatic drop in CM but 
the drop tapers off as per capita GNP continues to increase. 


FIGURE 6.7 

Relationship between 
child mortality and 
per capita GNP in 
66 countries. 


Child Mortality and PGNP 

400 r 
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( Continued) 


20 The slope of Eq. (6.7.1) is: dY/dX = -ft(1/X 2 ), implying that if ft is positive, the slope is 
negative throughout, and if ft is negative, the slope is positive throughout. See Figures 6.6 a 
and 6.6c, respectively. 
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EXAMPLE 6.6 

0 Continued ) 


TABLE 6.4 Fertility and Other Data for 64 Countries 


Observation CM FLFP PGNP TFR Observation CM FLFP PGNP TFR 


2 

3 

4 

5 

6 

7 

8 
9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 
21 
22 

23 

24 

25 

26 

27 

28 

29 

30 

31 

32 


128 37 1870 6.66 

204 22 130 6.15 

202 16 310 7.00 

197 65 570 6.25 

96 76 2050 3.81 

209 26 200 6.44 

170 45 670 6.19 

240 29 300 5.89 

241 11 120 5.89 

55 55 290 2.36 

75 87 1180 3.93 

129 55 900 5.99 

24 93 1730 3.50 

165 31 1150 7.41 

94 77 1160 4.21 

96 80 1270 5.00 

148 30 580 5.27 

98 69 660 5.21 

161 43 420 6.50 

118 47 1080 6.12 

269 17 290 6.19 

189 35 270 5.05 

126 58 560 6.16 

12 81 4240 1.80 

167 29 240 4.75 

135 65 430 4.10 

107 87 3020 6.66 

72 63 1420 7.28 

128 49 420 8.12 

27 63 19830 5.23 

152 84 420 5.79 

224 23 530 6.50 


33 

34 

35 

36 

37 

38 

39 

40 

41 

42 

43 

44 

45 

46 

47 

48 

49 

50 

51 

52 

53 

54 

55 

56 

57 

58 

59 

60 
61 
62 

63 

64 


142 50 8640 7.17 

104 62 350 6.60 

287 31 230 7.00 

41 66 1620 3.91 

312 11 190 6.70 

77 88 2090 4.20 

142 22 900 5.43 

262 22 230 6.50 

215 12 140 6.25 

246 9 330 7.10 

191 31 1010 7.10 

182 19 300 7.00 

37 88 1730 3.46 

103 35 780 5.66 

67 85 1300 4.82 

143 78 930 5.00 

83 85 690 4.74 

223 33 200 8.49 

240 19 450 6.50 

312 21 280 6.50 

12 79 4430 1.69 

52 83 270 3.25 

79 43 1340 7.17 

61 88 670 3.52 

168 28 410 6.09 

28 95 4370 2.86 

121 41 1310 4.88 

115 62 1470 3.89 

186 45 300 6.90 

47 85 3630 4.10 

178 45 220 6.09 

142 67 560 7.20 


Note: CM = Child mortality, the number of deaths of children under age 5 in a year per 1000 live births. 

FLFP = Female literacy rate, percent. 

PGNP = per capita GNP in 1980. 

TFR = total fertility rate, 1980-1985, the average number of children bom to a woman, using age-specific fertility 
rates for a given year. 

Source: Chandan Mukheqee, Howard White, and Marc Whyte, Econometrics and Data Analysis for Developing Countries, Routledge, 
London, 1998, p. 456. 

If we try to fit the reciprocal model (6.7.1), we obtain the following regression results: 
CM,= 8,.79436 + 27,237., 7^) 

se = (10.8321) (3759.999) 1 ' ' * 

t= (7.5511) (7.2535) r 2 = 0.4590 

As per capita GNP increases indefinitely, child mortality approaches its asymptotic value 
of about 82 deaths per thousand. As explained in footnote 20, the positive value of the 
coefficient of (1 /PGNP t ) implies that the rate of change of CM with respect to PGNP is 
negative. 
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One of the important applications of Figure 6.6b is the celebrated Phillips curve of 
macroeconomics. Using the data on percent rate of change of money wages (/) and the 
unemployment rate (X) for the United Kingdom for the period 1861-1957, Phillips 
obtained a curve whose general shape resembles Figure 6.6b (Figure 6.8). 21 

As Figure 6.8 shows, there is an asymmetry in the response of wage changes to the level 
of the unemployment rate: Wages rise faster for a unit change in unemployment if the 
unemployment rate is below U N , which is called the natural rate of unemployment by econ¬ 
omists (defined as the rate of unemployment required to keep [wage] inflation constant), 
and then they fall slowly for an equivalent change when the unemployment rate is above 
the natural rate, U N , indicating the asymptotic floor, or — /Si, for wage change. This partic¬ 
ular feature of the Phillips curve may be due to institutional factors, such as union bargaining 
power, minimum wages, unemployment compensation, etc. 

Since the publication of Phillips's article, there has been very extensive research on the 
Phillips curve at the theoretical as well as empirical levels. Space does not permit us to go 
into the details of the controversy surrounding the Phillips curve. The Phillips curve itself 
has gone through several incarnations. A comparatively recent formulation is provided by 
Olivier Blanchard. 22 If we let n t denote the inflation rate at time t, which is defined as the 
percentage change in the price level as measured by a representative price index, such as 
the Consumer Price Index (CPI), and UN t denote the unemployment rate at time t, then a 
modern version of the Phillips curve can be expressed in the following format: 

Jt t ~9$ = &(UN t — U N ) + u t (6.7.3) 

where n t = actual inflation rate at time t 

nf = expected inflation rate at time t, the expectation being 

formed in year (t-1) (Continued) 


21 A. W. Phillips, "The Relationship between Unemployment and the Rate of Change of Money Wages 
in the United Kingdom, 1861-1957," Economica, November 1958, vol. 15, pp. 283-299. Note that 
the original curve did not cross the unemployment rate axis, but Fig. 6.8 represents a later version of 
the curve. 

22 See Olivier Blanchard, Macroeconomics, Prentice Hall, Englewood Cliffs, NJ, 1997, Chap. 1 7. 
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EXAMPLE 6.6 UN f = actual unemployment rate prevailing at time t 

(i Continued) U N = natural rate of unemployment 

u t = stochastic error term 23 

Since nf is not directly observable, as a starting point one can make the 
simplifying assumption that nf = 7r t -i; that is, the inflation rate expected this year is the 
inflation rate that prevailed in the last year; of course, more complicated assumptions 
about expectations formation can be made, and we will discuss this topic in Chapter 17, 
on distributed lag models. 

Substituting this assumption into Eq. (6.7.3) and writing the regression model in the 
standard form, we obtain the following estimating equation: 

nt — 7Tt-i = Pt + j02UNt + Ut (6.7.4) 

where fa = —f}2U N . Equation 6.7.4 states that the change in the inflation rate between 
two time periods is linearly related to the current unemployment rate. A priori, f} 2 is 
expected to be negative (why?) and /Si is expected to be positive (this figures, since p 2 is 
negative and U N is positive). 

Incidentally, the Phillips relationship given in Eq. (6.7.3) is known in the literature as the 
modified Phillips curve, or the expectations-augmented Phillips curve (to indicate 

that 77" t _i stands for expected inflation), or the accelerationist Phillips curve (to suggest 

that a low unemployment rate leads to an increase in the inflation rate and hence an accel¬ 
eration of the price level). 


EXAMPLE 6.7 As an illustration of the modified Phillips curve, we present in Table 6.5 data on inflation 
as measured by year-to-year percentage in the Consumer Price Index (CPIflation) and 
the unemployment rate for the period 1960-2006. The unemployment rate represents 
the civilian unemployment rate. From these data we obtained the change in the inflation 
rate ( iz t — nt- i) and plotted it against the civilian unemployment rate; we are using the CPI 
as a measure of inflation. The resulting graph appears in Figure 6.9. 

As expected, the relation between the change in inflation rate and the unemployment 
rate is negative—a low unemployment rate leads to an increase in the inflation rate and 
therefore an acceleration of the price level, hence the name accelerationist Phillips curve. 

Looking at Figure 6.9, it is not obvious whether a linear (straight line) regression model 
or a reciprocal model fits the data; there may be a curvilinear relationship between the 
two variables. We present below regressions based on both the models. However, keep in 
mind that for the reciprocal model the intercept term is expected to be negative and the 
slope positive, as noted in footnote 20. 


Linear model: (n t -n t -i)= 3.7844 - 0.6385 UN t 

t= (4.1912) (-4.2756) r 2 = 0.2935 

Reciprocal model: 

St^T)= -3.0684 + 17.2077( D L t ) (6 . 7 . 6) 

t= (-3.1635) (3.2886) r 2 = 0.1973 

All the estimated coefficients in both the models are individually statistically significant, all 
the p values being lower than the 0.005 level. 


23 Economists believe this error term represents some kind of supply shock, such as the OPEC oil 
embargoes of 1973 and 1979. 
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TABLE 6.5 

Inflation Rate and 


Year INFLRATE UNRATE Year 


INFLRATE UNRATE 


Unemployment 
Rate, United States, 
1960-2006 
(For all urban 
consumers; 
1982-1984 = 100, 
except as noted) 

the President, 2007, Table ^ 
B-60,p. 399, for CPI changes 


1960 1.718 

1961 1.014 

1962 1.003 

1963 1.325 

1964 1.307 

1965 1.613 

1966 2.857 

1967 3.086 

1968 4.192 

1969 5.460 

1970 5.722 

1971 4.381 

1972 3.210 

1973 6.220 

1974 11.036 

1975 9.128 

1976 5.762 

1977 6.503 

1978 7.591 

1979 11.350 

1980 13.499 

1981 10.316 

1982 6.161 


5.5 

6.7 

5.5 

5.7 
5.2 

4.5 

3.8 

3.8 

3.6 

3.5 

4.9 

5.9 

5.6 

4.9 

5.6 

8.5 

7.7 

7.1 

6.1 

5.8 
7.1 

7.6 

9.7 


1984 4.317 

1985 3.561 

1986 1.859 

1987 3.650 

1988 4.137 

1989 4.818 

1990 5.403 

1991 4.208 

1992 3.010 

1993 2.994 

1994 2.561 

1995 2.834 

1996 2.953 

1997 2.294 

1998 1.558 

1999 2.209 

2000 3.361 

2001 2.846 

2002 1.581 

2003 2.279 

2004 2.663 

2005 3.388 

2006 3.226 


7.5 

7.2 
7.0 

6.2 

5.5 

5.3 

5.6 
6.8 

7.5 

6.9 

6.1 

5.6 

5.4 

4.9 

4.5 
4.2 
4.0 

4.7 

5.8 

6.0 

5.5 
5.1 

4.6 


1983 3.212 9.6 


FIGURE 6.9 

The modified 
Phillips curve. 


6 



-4 
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3456789 10 

Unemployment rate (%) 


Model (6.7.5) shows that if the unemployment rate goes down by 1 percentage point, 
on average, the change in the inflation rate goes up by about 0.64 percentage points, and 
vice versa. Model (6.7.6) shows that even if the unemployment rate increases indefinitely, 
the most the change in the inflation rate will go down will be about 3.07 percentage 
points. Incidentally, from Eq. (6.7.5), we can compute the underlying natural rate of 
unemployment as: 


U N = 




3.7844 


= 5.9270 


(6.7.7) 


-02 0.6385 

That is, the natural rate of unemployment is about 5.93%. Economists put the natural rate 
between 5 and 6%, although in the recent past in the United States the actual rate has 
been much below this rate. 
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FIGURE 6.10 

The log reciprocal 
model. 


Y 



X 


Log Hyperbola or Logarithmic Reciprocal Model 

We conclude our discussion of reciprocal models by considering the logarithmic reciprocal 
model, which takes the following form: 



(6.7.8) 


Its shape is as depicted in Figure 6.10. As this figure shows, initially Y increases at an in¬ 
creasing rate (i.e., the curve is initially convex) and then it increases at a decreasing rate 
(i.e., the curve becomes concave). 24 Such a model may therefore be appropriate to model a 
short-run production function. Recall from microeconomics that if labor and capital are the 
inputs in a production function and if we keep the capital input constant but increase the 
labor input, the short-run output-labor relationship will resemble Figure 6.10. (See Exam¬ 
ple 7.3, Chapter 7.) 


6.8 Choice of Functional Form 


In this chapter we discussed several functional forms an empirical model can assume, even 
within the confines of the linear-in-parameter regression models. The choice of a particular 
functional form may be comparatively easy in the two-variable case, because we can plot 
the variables and get some rough idea about the appropriate model. The choice becomes 
much harder when we consider the multiple regression model involving more than one re¬ 
gressor, as we will discover when we discuss this topic in the next two chapters. There is no 


24 From calculus, it can be shown that 



But 



Making this substitution, we obtain 



which is the slope of Y with respect 
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TABLE 6.6 



Note: * indicates that the elasticity is variable, depending on the value taken by A" or Y or both. When no X and/values are 
specified, in practice, very often these elasticities are measured at the mean values of these variables, namely, X and Y. 


denying that a great deal of skill and experience are required in choosing an appropriate 
model for empirical estimation. But some guidelines can be offered: 

1. The underlying theory (e.g., the Phillips curve) may suggest a particular functional 
form. 

2. It is good practice to find out the rate of change (i.e., the slope) of the regressand with 
respect to the regressor as well as to find out the elasticity of the regressand with respect to 
the regressor. For the various models considered in this chapter, we provide the necessary 
formulas for the slope and elasticity coefficients of the various models in Table 6.6. The 
knowledge of these formulas will help us to compare the various models. 

3. The coefficients of the model chosen should satisfy certain a priori expectations. For 
example, if we are considering the demand for automobiles as a function of price and some 
other variables, we should expect a negative coefficient for the price variable. 

4. Sometimes more than one model may fit a given set of data reasonably well. In the 
modified Phillips curve, we fitted both a linear and a reciprocal model to the same data. In 
both cases the coefficients were in line with prior expectations and they were all statistically 
significant. One major difference was that the r 2 value of the linear model was larger than 
that of the reciprocal model. One may therefore give a slight edge to the linear model over 
the reciprocal model. But make sure that in comparing two r 2 values the dependent vari¬ 
able, or the regressand, of the two models is the same; the regressor(s) can take any form. 
We will explain the reason for this in the next chapter. 

5. In general one should not overemphasize the r 1 measure in the sense that the higher 
the r 2 the better the model. As we will discuss in the next chapter, r 2 increases as we add 
more regressors to the model. What is of greater importance is the theoretical underpinning 
of the chosen model, the signs of the estimated coefficients and their statistical signifi¬ 
cance. If a model is good on these criteria, a model with a lower r 2 may be quite acceptable. 
We will revisit this important topic in greater depth in Chapter 13. 

6. In some situations it may not be easy to settle on a particular functional form, in 
which case we may use the so-called Box-Cox transformations. Since this topic is rather 
technical, we discuss the Box-Cox procedure in Appendix 6A.5. 
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*6.9 A Note on the Nature of the Stochastic Error Term: Additive 
versus Multiplicative Stochastic Error Term 

Consider the following regression model, which is the same as Eq. (6.5.1) but without the 
error term: 


% 

sa 

II 

(6.9.1) 

For estimation purposes, we can express this model in three different forms: 

Y t = 

(6.9.2) 

Y t = faXf 2 e u ‘ 

(6.9.3) 

Yt = fax? 2 + m 

Taking the logarithms on both sides of these equations, we obtain 

(6.9.4) 

In Yt = « + /f 2 In Xj + In a, 

(6.9.2 a) 

In Yt = a + fa In X, + u, 

(6.9.3a) 

In 7, = In (/L xf 2 + Ui) 

(6.9.4a) 


where a = In 

Models like Eq. (6.9.2) are intrinsically linear (in-parameter) regression models in the 
sense that by suitable (log) transformation the models can be made linear in the parameters 
a and p%. (Note: These models are nonlinear in (i\.) But model (6.9.4) is intrinsically 
nonlinear-in-parameter. There is no simple way to take the log of Eq. (6.9.4) because 
\n(A + B)^\nA + \nB. 

Although Eqs. (6.9.2) and (6.9.3) are linear regression models and can be estimated by 
ordinary least squares (OLS) or maximum likelihood (ML), we have to be careful about the 
properties of the stochastic error term that enters these models. Remember that the BLUE 
property of OLS (best linear unbiased estimator) requires that w, has zero mean value, con¬ 
stant variance, and zero autocorrelation. For hypothesis testing, we further assume that u, 
follows the normal distribution with mean and variance values just discussed. In short, we 
have assumed that m, ~ N( 0, cr 2 ). 

Now consider model (6.9.2). Its statistical counterpart is given in (6.9.2a). To use the 
classical normal linear regression model (CNLRM), we have to assume that 

In Ui ~ N( 0, a 2 ) (6.9.5) 

Therefore, when we run the regression (6.9.2a), we will have to apply the normality tests 
discussed in Chapter 5 to the residuals obtained from this regression. Incidentally, note that 
if In Ui follows the normal distribution with zero mean and constant variance, then statisti¬ 
cal theory shows that w, in Eq. (6.9.2) must follow the log-normal distribution with mean 
e a /2 and variance e a (e a - 1). 

As the preceding analysis shows, one has to pay very careful attention to the error 
term in transforming a model for regression analysis. As for Eq. (6.9.4), this model is a 
nonlinear-in-parameter regression model and will have to be solved by some iterative 
computer routine. Model (6.9.3) should not pose any problems for estimation. 


'Optional 



Summary and 
Conclusions 
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To sum up, pay very careful attention to the disturbance term when you transform a 
model for regression analysis. Otherwise, a blind application of OLS to the transformed 
model will not produce a model with desirable statistical properties. 


This chapter introduced several of the finer points of the classical linear regression model 

(CLRM). 

1. Sometimes a regression model may not contain an explicit intercept term. Such models 
are known as regression through the origin. Although the algebra of estimating such 
models is simple, one should use such models with caution. In such models the sum 
of the residuals is nonzero; additionally, the conventionally computed r 2 may not 
be meaningful. Unless there is a strong theoretical reason, it is better to introduce the 
intercept in the model explicitly. 

2. The units and scale in which the regressand and the regressor(s) are expressed are very 
important because the interpretation of regression coefficients critically depends on 
them. In empirical research the researcher should not only quote the sources of data but 
also state explicitly how the variables are measured. 

3. Just as important is the functional form of the relationship between the regressand and 
the regressor(s). Some of the important functional forms discussed in this chapter are 
(a) the log-linear or constant elasticity model, (b) semilog regression models, and 
(c) reciprocal models. 

4. In the log-linear model both the regressand and the regressor(s) are expressed in the log¬ 
arithmic form. The regression coefficient attached to the log of a regressor is interpreted 
as the elasticity of the regressand with respect to the regressor. 

5. In the semilog model either the regressand or the regressor(s) are in the log form. In the 
semilog model where the regressand is logarithmic and the regressor X is time, the esti¬ 
mated slope coefficient (multiplied by 100) measures the (instantaneous) rate of growth 
of the regressand. Such models are often used to measure the growth rate of many eco¬ 
nomic phenomena. In the semilog model if the regressor is logarithmic, its coefficient 
measures the absolute rate of change in the regressand for a given percent change in the 
value of the regressor. 

6. In the reciprocal models, either the regressand or the regressor is expressed in recipro¬ 
cal, or inverse, form to capture nonlinear relationships between economic variables, as 
in the celebrated Phillips curve. 

7. In choosing the various functional forms, great attention should be paid to the stochastic 
disturbance term m„ As noted in Chapter 5, the CLRM explicitly assumes that the distur¬ 
bance term has zero mean value and constant (homoscedastic) variance and that it is un¬ 
correlated with the regressor(s). It is under these assumptions that the OLS estimators are 
BLUE. Further, under the CNLRM, the OLS estimators are also normally distributed. One 
should therefore find out if these assumptions hold in the functional form chosen for em¬ 
pirical analysis. After the regression is run, the researcher should apply diagnostic tests, 
such as the normality test, discussed in Chapter 5. This point cannot be overemphasized, for 
the classical tests of hypothesis, such as the t, F, and x 2 , rest on the assumption that the dis¬ 
turbances are normally distributed. This is especially critical if the sample size is small. 

8. Although the discussion so far has been confined to two-variable regression models, the 
subsequent chapters will show that in many cases the extension to multiple regression 
models simply involves more algebra without necessarily introducing more fundamen¬ 
tal concepts. That is why it is so very important that the reader have a firm grasp of the 
two-variable regression model. 
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EXERCISES 


Questions 

6.1. Consider the regression model 

yi — Pi + PlXi + Ui 

where yi =(Y t — Y) and x, = (X, — X). In this case, the regression line must pass 
through the origin. True or false? Show your calculations. 

6.2. The following regression results were based on monthly data over the period January 
1978 to December 1987: 


Y,= 0.00681 

+ 0.758 \5X t 


se = (0.02596) 

(0.27009) 


t = (0.26229) 

(2.80700) 


p value = (0.7984) 

(0.0186) 

r 2 = 0.4406 

Y t = 0.76214A) 

se = (0.265799) 

t = (2.95408) 

p value = (0.0131) 


r 2 = 0.43684 


where Y — monthly rate of return on Texaco common stock, %, and X — monthly 

market rate of return,%.* 

a. What is the difference between the two regression models? 

b. Given the preceding results, would you retain the intercept term in the first 
model? Why or why not? 

c. How would you interpret the slope coefficients in the two models? 

d. What is the theory underlying the two models? 

e. Can you compare the r 2 terms of the two models? Why or why not? 

f The Jarque-Bera normality statistic for the first model in this problem is 1.1167 
and for the second model it is 1.1170. What conclusions can you draw from these 
statistics? 

g. The t value of the slope coefficient in the zero intercept model is about 2.95, 
whereas that with the intercept present is about 2.81. Can you rationalize this 
result? 

6.3. Consider the following regression model: 


2 (t) + "' 


Note: Neither Y nor X assumes zero value. 

a. Is this a linear regression model? 

b. How would you estimate this model? 

c. What is the behavior of Y as X tends to infinity? 

d. Can you give an example where such a model may be appropriate? 


*The underlying data were obtained from the data diskette included in Ernst R. Berndt, The Practice of 
Econometrics: Classic and Contemporary, Addison-Wesley, Reading, Mass., 1991. 
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6.4. Consider the log-linear model: 

In Y i= p i + ft In ft + Ui 

Plot Y on the vertical axis and X on the horizontal axis. Draw the curves showing the 
relationship between Y and X when ft = 1, and when ft > 1, and when ft < 1. 

6.5. Consider the following models: 

Model I: Y t = ft + ft ft + Ui 

Model II: Y* — ct\ + ctjX* + u, 

where Y* andft are standardized variables. Show that ft = ft(ft/ft) and hence es¬ 
tablish that although the regression slope coefficients are independent of the change 
of origin they are not independent of the change of scale. 

6.6. Consider the following models: 

XnY* =a 1 +a 2 ln X*+u* 

In Yj = Pi + ft In ft + U j 

where Y* = wi Y t and X* = w 2 ft, the w’s being constants. 

a. Establish the relationships between the two sets of regression coefficients and 
their standard errors. 

b. Is the r 2 different between the two models? 

6.7. Between regressions (6.6.8) and (6.6.10), which model do you prefer? Why? 

6.8. For the regression (6.6.8), test the hypothesis that the slope coefficient is not signifi¬ 
cantly different from 0.005. 

6.9. From the Phillips curve given in Eq. (6.7.3), is it possible to estimate the natural rate 
of unemployment? How? 

6.10. The Engel expenditure curve relates a consumer’s expenditure on a commodity to his 
or her total income. Letting Y = consumption expenditure on a commodity and X = 
consumer income, consider the following models: 

Y t = ft + ftft + u t 
Yi=fr+ ft(l /ft) + Ui 
In Yi = In ft + ft In ft + m. 

In F) = lnft + ft(l/ft) + Ui 
Yi =ft + ft In ft + Ui 

Which of these model(s) would you choose for the Engel expenditure curve and 
why? (Hint: Interpret the various slope coefficients, find out the expressions for 
elasticity of expenditure with respect to income, etc.) 

6.11. Consider the following model: 

e fh+fhx, 

v = _ 

1 _|_ ftl +/^2 V , 

As it stands, is this a linear regression model? If not, what “trick,” if any, can you use 
to make it a linear regression model? How would you interpret the resulting model? 
Under what circumstances might such a model be appropriate? 
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6.12. Graph the following models (for ease of exposition, we have omitted the observation 
subscript, i): 

a. Y = ftX&, for ft > 1, ft = 1, 0 < ft < 1,.... 

b. Y = fte** for ft > 0 and ft < 0. 

Discuss where such models might be appropriate. 

6.13. Consider the following regression:* 

SPI, = -17.8 + 33.2 Gini, 
se = (4.9) (11.8) r 1 = 0.16 

Where SPI = index of sociopolitical instability, average for 1960-1985, and Gini = 
Gini coefficient for 1975 or the closest available year within the range of 1970-1980. 
The sample consist of 40 countries. 

The Gini coefficient is a measure of income inequality and it lies between 0 and 1. 
The closer it is to 0, the greater the income equality, and the closer it is to 1, the 
greater the income inequality. 

a. How do you interpret this regression? 

b. Suppose the Gini coefficient increases from 0.25 to 0.55. By how much does SPI 
go up? What does that mean in practice? 

c. Is the estimated slope coefficient statistically significant at the 5% level? Show the 
necessary calculations. 

d. Based on the preceding regression, can you argue that countries with greater in¬ 
come inequality are politically unstable? 

Empirical Exercises 

6.14. You are given the data in Table 6.7.** Fit the following model to these data and obtain 
the usual regression statistics and interpret the results: 



TABLE 6.7 


Y, 86 79 76 69 65 62 52 51 51 48 

X, 3 7 12 17 25 35 45 55 70 120 


6.15. To study the relationship between investment rate (investment expenditure as a ratio 
of the GDP) and savings rate (savings as a ratio of GDP), Martin Feldstein and 
Charles Horioka obtained data for a sample of 21 countries. (See Table 6.8.) The 
investment rate for each country is the average rate for the period 1960-1974 and the 
savings rate is the average savings rate for the period 1960-1974. The variable Invrate 
represents the investment rate and the variable Savrate represents the savings rate. 1 ' 
a. Plot the investment rate against the savings rate. 


*See David N. Weil, Economic Growth, Addison Wesley, Boston, 2005, p. 392. 

**Adapted from J. Johnston, Econometric Methods, 3d ed., McGraw-Hill, New York, 1984, p. 87. Actu¬ 
ally this is taken from an econometric examination of Oxford University in 1975. 

^Martin Feldstein and Charles Horioka, "Domestic Saving and International Capital Flows," Economic 
journal, vol. 90, June 1980, pp. 314-329. Data reproduced from Michael P. Murray, Econometrics: A 
Modern Introduction, Addison-Wesley, Boston, 2006. 






Chapter 6 Extensions of the Two-Variable Linear Regression Model 179 


TABLE 6.8 



SAVRATE 

INVRATE 

Australia 

0.250 

0.270 

Austria 

0.285 

0.282 

Belgium 

0.235 

0.224 

Canada 

0.219 

0.231 

Denmark 

0.202 

0.224 

Finland 

0.288 

0.305 

France 

0.254 

0.260 

Germany 

0.271 

0.264 

Greece 

0.219 

0.248 

Ireland 

0.190 

0.218 

Italy 

0.235 

0.224 

Japan 

0.372 

0.368 

Luxembourg 

0.313 

0.277 

Netherlands 

0.273 

0.266 

New Zealand 

0.232 

0.249 

Norway 

0.278 

0.299 

Spain 

0.235 

0.241 

Sweden 

0.241 

0.242 

Switzerland 

0.297 

0.297 

U.K. 

0.184 

0.192 

U.S. 

0.186 

0.186 


Note: SAVRATE = Savings as a ratio of GDP. 
INVRATE = Investment expenditure as a ratio of GDP. 


b. Based on this plot, do you think the following models might fit the data equally 
well? 

Invrate, = Savrate,- + w,- 

In Invrate,- = a\ + «2 In Savrate,- + w, 

c. Estimate both of these models and obtain the usual statistics. 

d. How would you interpret the slope coefficient in the linear model? In the log- 
linear model? Is there a difference in the interpretation of these coefficients? 

e. How would you interpret the intercepts in the two models? Is there a difference in 
your interpretation? 

f. Would you compare the two r 2 coefficients? Why or why not? 

g. Suppose you want to compute the elasticity of the investment rate with respect to 
the savings rate. How would you obtain this elasticity for the linear model? For 
the log-linear model? Note that this elasticity is defined as the percentage change 
in the investment rate for a percentage change in the savings rate. 

h. Given the results of the two regression models, which model would you prefer? 
Why? 

6.16. Table 6.9* gives the variable definitions for various kinds of expenditures, total 

expenditure, income, age of household, and the number of children for a sample of 

1,519 households drawn from the 1980-1982 British Family Expenditure Surveys. 


*The data are from Richard Blundell and Krishna Pendakur, "Semiparametric Estimation and 
Consumer Demand," journal of Applied Econometrics, vol. 1 3, no. 5, 1998, pp. 435-462. Data 
reproduced from R. Carter Hill, William E. Griffiths, and George G. Judge, Undergraduate Econometrics, 
2d ed., John Wiley & Sons, New York, 2001. 
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TABLE 6.9 


List of Variables: 

wfood = budget share for food expenditure 
wfuel = budget share for fuel expenditure 
wdoth = budget share for clothing expenditure 
wale = budget share for alcohol expenditure 
wtrans = budget share for transportation expenditure 
wother = budget share for other expenditures 

totexp = total household expenditure 

(rounded to the nearest 10 U.K. pounds sterling) 
income = total net household income 

(rounded to the nearest 10 U.K. pounds sterling) 
age = age of household head 
nk = number of children 

The budget share of a commodity, say food, is defined as: 

, , expenditure on food 

wfood = —--:--- 

total expenditure 

The actual dataset can be found on this text’s website. The data include only house¬ 
holds with one or two children living in Greater London. The sample does not in¬ 
clude self-employed or retired households. 

a. Using the data on food expenditure in relation to total expenditure, determine which 
of the models summarized in Table 6.6 fits the data. 

b. Based on the regression results obtained in (a), which model seems appropriate in the 
present instance? 

Note: Save these data for further analysis in the next chapter on multiple regression. 

6.17. Refer to Table 6.3. Find out the rate of growth of expenditure on durable goods. What is 
the estimated semielasticity? Interpret your results. Would it make sense to run a double¬ 
log regression with expenditure on durable goods as the regressand and time as the 
regressor? How would you interpret the slope coefficient in this case? 

6.18. From the data given in Table 6.3, find out the growth rate of expenditure on nondurable 
goods and compare your results with those obtained from Exercise 6.17. 

6.19. Table 6.10 gives data for the U.K. on total consumer expenditure (in £ millions) and 
advertising expenditure (in £ millions) for 29 product categories.* 

a. Considering the various functional forms we have discussed in the chapter, which 
functional form might fit the data given in Table 6.10? 

b. Estimate the parameters of the chosen regression model and interpret your results. 

c. If you take the ratio of advertising expenditure to total consumer expenditure, what do 
you observe? Are there any product categories for which this ratio seems unusually 
high? Is there anything special about these product categories that might explain the 
relatively high expenditure on advertising? 

6.20. Refer to Example 3.3 in Chapter 3 to complete the following: 

a. Plot cell phone demand against purchasing power (PP) adjusted per capita income. 

b. Plot the log of cell phone demand against the log of PP-adjusted per capita income. 

c. What is the difference between the two graphs? 

d. From these two graphs, do you think that a double-log model might provide a better fit 
to the data than the linear model? Estimate the double-log model. 

e. How do you interpret the slope coefficient in the double-log model? 

f. Is the estimated slope coefficient in the double-log model statistically significant at the 
5% level? 

*These data are from Advertising Statistics Year Book, 1996, and are reproduced from http://www. 
Economicswebinstitute.org/ecdata.htm. 






Chapter 6 Extensions of the Two-Variable Linear Regression Model 181 


TABLE 6.10 

Advertising 
Expenditure and 
Total Expenditure 
(in £ millions) for 
29 Product 
Categories in the 
U.K. 


obs 

1 

2 

B 

4 

5 

6 

7 

8 
9 

10 

11 

12 

IB 

14 

15 

16 

17 

18 

19 

20 
21 
22 

23 

24 

25 

26 

27 

28 
29 


ADEXP 

87957.00 
23578.00 
16345.00 
6550.000 
10230.00 
9127.000 
1675.000 
1110.000 
3351.000 
1140.000 
6376.000 
4500.000 
1899.000 
10101.00 
3831.000 
99528.00 
15855.00 
8827.000 
5451 7.00 
49593.00 
39664.00 
327.0000 
22549.00 
416422.0 
14212.00 
541 74.00 
20218.00 
11041.00 
22542.00 


CONEXP RATIO 


1 3599.00 
4699.000 
5473.000 
6119.000 
8811.000 
1142.000 
143.0000 
1 38.0000 
85.00000 
108.0000 
307.0000 
1545.000 
943.0000 
369.0000 
285.0000 
1052.000 
862.0000 
84.00000 
11 74.000 
2531.000 
408.0000 
295.0000 
488.0000 
19200.00 
94.00000 
5320.000 
357.0000 
159.0000 
244.0000 


0.006468 
0.005018 
0.002986 
0.001070 
0.001161 
0.007992 
0.011713 
0.008043 
0.039424 
0.010556 
0.020769 
0.00291 3 
0.002014 
0.027374 
0.013442 
0.094608 
0.018393 
0.105083 
0.046437 
0.019594 
0.097216 
0.001108 
0.046207 
0.021689 
0.151191 
0.010183 
0.056633 
0.069440 
0.092385 


Note: ADEXP = Advertising expenditure (£, millions) 

CONEXP = Total consumer expenditure (£, millions) 

g. How would you estimate the elasticity of cell phone demand with respect to PP- 
adjusted income for the linear model given in Eq. (3.7.3)? What additional informa¬ 
tion, if any, do you need? Call the estimated elasticity the income elasticity. 

h. Is there a difference between the income elasticity estimated from the double-log model 
and that estimated from the linear model? If so, which model would you choose? 

6.21. Repeat Exercise 6.20 but refer to the demand for personal computers given in Eq. (3.7.4). 
Is there a difference between the estimated income elasticities for cell phones and 
personal computers? If so, what factors might account for the difference? 

6.22. Refer to the data in Table 3.3. To find out if people who own PCs also own cell phones, 
run the following regression: 

CellPhone, =0i+ ft PCs, + m, 

a. Estimate the parameters of this regression. 

b. Is the estimated slope coefficient statistically significant? 

c. Does it matter if you run the following regression? 

PCs, = «i + a 2 Cellphone, + u, 

d. Estimate the preceding regression and test the statistical significance of the estimated 
slope coefficient. 

e. How would you decide between the first and the second regression? 








182 Part One Single-Equation Regression Models 


Appendix 6A 


6A.1 Derivation of Least-Squares Estimators 
for Regression through the Origin 

We want to minimize 




with respect to 02. 

Differentiating (1) with respect to 0 2 , 

dfo 

Setting Eq. (2) equal to zero and simplifying, 


obtain 

-Ijjiji - hXdi-Xt) 


02 — 


£14 

EWt 


02 = 

= 02 4 

E{0 2 - 02f ■ 


( 1 ) 

( 2 ) 

(6.1.6) = (3) 

e obtain 

(4) 

(5) 

nonstochastic and the u, are ho- 

(6.1.7) = (6) 

(7) 

From Appendix 3 A, Section 3 A. 1, we see that when the intercept term is present in the model, we get 
in addition to Eq. (7) the condition £ u ; = 0. From the mathematics just given it should be clear why 
the regression through the origin model may not have the error sum, equal to zero. 

Suppose we want to impose the condition that it, = 0. In that case we have 

X>=ftX>+E«- 

( 8 ) 

= 02 Xj, since ^ w,- = 0 by construction 
This expression then gives 

3 _ Y.Yi 

E X i 


Now substituting the PRF: K, = 0 2 X, + «, into this equation, \ 

■ 

[Note: E(0 2 ) = 02-] Therefore, 

[M 

Expanding the right-hand side of Eq. (5) and noting that the X, ar 
moscedastic and uncorrelated, we obtain 




var(ft) = E(0 2 - 0i)‘ 

:e that from Eq. (2) we get, after equating it to 

E *'*■= 6 


But this estimator is not the same as Eq. (3) above or Eq. (6.1.6). And since the 0 2 of Eq. (3) is 
unbiased (why?), the 0 2 of Eq. (9) cannot be unbiased. 

The upshot is that, in regression through the origin, we cannot have both £ u,X t and ^ M; equal 
to zero, as in the conventional model. The only condition that is satisfied is that £ u ,■ X, is zero. 
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Recall that 

Y l = Y i + u, (2.6.3) 

Summing this equation on both sides and dividing by N, the sample size, we obtain 

Y = Y + (i (10) 

Since for the zero intercept model ar| d, therefore u, need not be zero, it then follows that 

T** 01 ) 


that is, the mean of actual Y values need not be equal to the mean of the estimated Y values; the two 
mean values are identical for the intercept-present model, as can he seen from Eq. (3.1.10). 

It was noted that, for the zero-intercept model, r 2 can be negative, whereas for the conventional 
model it can never be negative. This condition can be shown as follows. 

Using Eq. (3.5.5a), we can write 


RSS _ 

TSS £jf : 


( 12 ) 


Now for the conventional, or intercept-present, model, Eq. (3.3.6) shows that 


rss - 03) 

unless /3 2 is zero (i.e,, X has no influence on Y whatsoever). That is, for the conventional model, 
RSS < TSS, or, r 2 can never be negative. 

For the zero-intercept model it can be shown analogously that 


RSS = £a? = £l 04) 


(Note: The sums of squares of Y and X are not mean-adjusted.) Now there is no guarantee that this 
RSS will always be less than JZy? = Y f ~ NT 2 (the TSS), which suggests that RSS can be 
greater than TSS, implying that r 2 , as conventionally defined, can be negative. Incidentally, notice that 
in this case RSS will be greater than TSS if /§f J2 Xj < ^Y 2 . 


6A.2 Proof that a Standardized Variable 
Has Zero Mean and Unit Variance 


Consider the random variable (r.v.) Y with the (sample) mean 
ation of S y . Define 



value of Y and (sample) standard devi- 

(15) 


Hence Y* is a standardized variable. Notice that standardization involves a dual operation: (1) change 
of the origin, which is the numerator of Eq. (15), and (2) change of scale, which is the denominator. 
Thus, standardization involves both a change of the origin and change of scale. 


Y * 


1 E(D ~ Y) 

S y n 


= 0 


(16) 


since the sum of deviation of a variable from its mean value is always zero. Hence the mean value of 
the standardized value is zero. (Note: We could pull out the S y term from the summation sign because 
its value is known.) 

0 2 _ v- (Yi ~ Y) 2 /(n - 1) 


(* - 1 )S 2 y 
(n 51)H 


(17) 
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Note that 

c2 _ E( Y i ~ V 2 
S '~ n-1 

which is the sample variance of Y. 

6A.3 Logarithms 

Consider the numbers 5 and 25. We know that 

25 = 5 2 (18) 

We say that the exponent 2 is the logarithm of 25 to the base 5. More formally, the logarithm of a 
number (e.g., 25) to a given base (e.g., 5) is the power (2) to which the base (5) must be raised to ob¬ 
tain the given number (25). 

More generally, if 


= b x (6 > 0) 

(19) 

lo gi 7 = X 

(20) 


In mathematics the function (19) is called an exponential function and the function (20) is called the log¬ 
arithmic function. As is clear from Eqs. (19) and (20), one function is the inverse of the other function. 

Although any (positive) base can be used, in practice, the two commonly used bases are 10 and the 
mathematical number e = 2.71828 .... 

Logarithms to base 10 are called common logarithms. Thus, 

log 10 100 = 2 log 10 30 « 1.48 

That is, in the first case, 100 = 10 2 and in the latter case, 30 ~ 10 1 - 48 . 

Logarithms to the base e are called natural logarithms. Thus, 

log e 100 » 4.6051 and log e 30 - 3.4012 
All these calculations can he done routinely on a hand calculator. 

By convention, the logarithm to base 10 is denoted by the letters log and to the base e by In. Thus, 
in the preceding example, we can write log 100 or log 30 or In 100 or In 30. 

There is a fixed relationship between the common log and natural log, which is 

In X= 2.3026 log X (21) 

That is, the natural log of the number X is equal to 2.3026 times the log of X to the base 10. Thus, 
In 30 = 2.3026 log 30 = 2.3026(1.48) = 3.4012 (approx.) 

as before. Therefore, it does not matter whether one uses common or natural logs. But in mathemat¬ 
ics the base that is usually preferred is e, that is, the natural logarithm. Hence, in this book all logs are 
natural logs, unless stated explicitly. Of course, we can convert the log of a number from one basis to 
the other using Eq. (21). 

Keep in mind that logarithms of negative numbers are not defined. Thus, the log of (—5) or the In 
(—5) is not defined. 

Some properties of logarithms are as follows: If A and B are any positive numbers, then it can be 
shown that: 

1. In (AXE) = \nA+\nB (22) 

That is, the log of the product of two (positive) numbers A and B is equal to the sum of their logs. 

2. In (A/B) = In A - In B (23) 
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That is, the log of the ratio of A to B is the difference in the logs of A and B. 

3. \n{A±B)j=\nA±\nB (24) 

That is,the log of the sum or difference of A and B is not equal to the sum or difference of 
their logs. 

4. In (#) = k)nA (25) 

That is, the log of A raised to power kisk times the log of A. 

5. lne = 1 (26) 

That is, the log of e to itself as a base is 1 (as is the log of 10 to the base 10). 

6. In 1 = 0 (27) 

That is, the natural log of the number 1 is zero (as is the common log of number 1). 


(28) 


That is, the rate of change (i.e., the derivative) of Y with respect to X is 1 overX. The exponential 
and (natural) logarithmic functions are depicted in Figure 6A. 1. 

Although the number whose log is taken is always positive, the logarithm of that number can be 
positive as well as negative. It can be easily verified that if 


0 < 7 < 1 then In 7 < 0 
Y = 1 then In Y = 0 
7 > 1 then In 7 > 0 


Also note that although the logarithmic curve shown in Figure 6A.l(fi) is positively sloping, 
implying that the larger the number is, the larger its logarithmic value will be, the curve is increasing 
at a decreasing rate (mathematically, the second derivative of the function is negative). Thus, ln(10) = 
2.3026 (approx.) and ln(20) = 2.9957 (approx.). That is, if a number is doubled, its logarithm does 
not double. 

This is why the logarithm transformation is called a nonlinear transformation. This can also be 
seen from Equation (28), which notes that if 7= In X, dY/dX= 1 /X. This means that the slope of the 
logarithmic function depends on the value of X; that is, it is not constant (recall the definition of 
linearity in the variable). 

Logarithms and percentages: Since d ^P = y, or d{\n X) = for very small changes the 
change in In X is equal to the relative or proportional change in X. In practice, if the change in Xis 
reasonably small, the preceding relationship can be written as the change in In X & to the relative 
change in X, where ~ means approximately. 


FIGURE 6A.1 

Exponential and 
logarithmic functions: 

(a) Exponential 
function; 

( b ) logarithmic 
function. 



(a) 
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Thus, for small changes. 


(In X, - In X t _$ M 


{X t -X t -i) 

X,-i 


=relative change inX 


6A.4 Growth Rate Formulas 


Let the variable The a function of time, Y = f(i), where t denotes time. The instantaneous (i.e., a 
point in time) rate of growth of Y, g Y is defined as 

dY/dt 1 dY 

grm ^T=YlTt 


( 29 ) 


Note that if we multiply gy by 100, we get the percent rate of growth, where d J t is the rate of change 
of Y with respect to time. 

Now if we let In Y = lnf(t), where In stands for the natural logarithm, then 


d InF _ 1 dY 

dt ~ YU 


( 30 ) 


This is the same as Eq. (29). 

Therefore, logarithmic transformations are very useful in computing growth rates, especially if Y 
is a function of some other time-dependent variables, as the following example will show. Let 


Y = 


X■ z 


( 31 ) 


where Y is nominal GDP, X is real GDP, and Z is the (GDP) price deflator. In words, the nominal GDP 
is real GDP multiplied by the (GDP) price deflator. All these variables are functions of time, as they 
vary over time. 


Now taking logs on both sides of Eq. (31), 

we obtain: 


In Y : 

= lnX+ InZ 

( 32 ) 

Differentiating Eq. (32) with respect to time, 1 

veget 


1 dY 

1 dX i 1 dZ 


Y dt 

X dt + Z dt 

( 33 ) 


that is, gr — gx + gz, where g denotes growth rate. 

In words, the instantaneous rate of growth of Y is equal to the sum of the instantaneous rate of 
growth of X plus the instantaneous rate of growth of Z. In the present example, the instantaneous rate 
of growth of nominal GDP is equal to the sum of the instantaneous rate of growth of real GDP and 
the instantaneous rate of growth of the GDP price deflator. 

More generally, the instantaneous rate of growth of a product is the sum of the instantaneous rates 
of growth of its components. This can be generalized to the product of more than two variables. 

In similar fashion, if we have 

-I 

1 dY _ 1 dX _ 1 dZ 

Y~dt “ Xlit ~ Z~dt ' ' 


that is, gy = gx — gz- In other words, the instantaneous rate of growth of Tis the difference between 
the instantaneous rate of growth ofXminus the instantaneous rate of growth of Z. Thus if 7= per capita 
income, X = GDP and Z = population, then the instantaneous rate of growth of per capita income is 
equal to the instantaneous rate of growth of GDP minus the instantaneous rate of growth of population. 

Now let Y = X + Z. What is the rate of growth of T? Let Y = total employment, X = blue collar 
employment, and Z = white collar employment. Since 


ln(X + Z) ^ lnX+ InZ, 
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it is not easy to compute the 


of growth of Y, but with some algebra, it i 


X 

8Y = Y+~Z gX 


X^Z 8Z 


be shown that 

(36) 


That is, the rate of growth of a sum is a weighted average of the rates of growth of its components. 
For our example, the rate of growth of total employment is a weighted average of the rates of growth 
of white collar employment and blue collar employment, the weights being the share of each compo¬ 
nent in total employment. 


6A.5 Box-Cox Regression Model 


Consider the following regression model 

Y^m$ l + frXi + Ui Y > 0 (37) 

where A (Greek lamda) is a parameter, which may be negative, zero, or positive. Since Y is raised to 
the power A, we will get several transformations of Y, depending on the value of A. 

Equation (37) is known as the Box-Cox regression model, named after the statisticians Box and 
Cox. 1 Depending on the value of A, we have the following regression models, which are shown in 
tabular form: 


Value of A 

Regression Model 

1 

Yi = Pi + P 2 Xj + Uj 

2 

Yf = Pi + p 2 X, + Uj 

0.5 

s/Yj = Pi + P 2 Xi + Uj 

0 

In Yi = pi + p 2 Xi + ui 

-0.5 

= Pi + P 2 Xi + Uj 

-1.0 

y = Pi + P 2 Xi + Ui 


As you can see, linear and log-linear models are special cases of the Box-Cox family of 
transformations. 

Of course, we can apply such transformations to the X variable(s) also. It is interesting to note that 
when A is zero, we get the log-transformation of Y. The proof of this is slightly involved and is best 
left for the references. (Calculus-minded readers will have to recall the THopital Rule.) 

But how do we actually determine the appropriate value of A in a given situation? We cannot 
estimate Eq. (37) directly, for it involves not only the regression parameters pi and Pi but also A, 
which enters nonlinearly. But it can be shown that we can use the method of maximum likelihood to 
estimate all these parameters. Regression packages exist to do just that. 

We will not pursue this topic here because the procedure is somewhat involved. 

However, we can proceed by trial and error. Choose several values of A, transform Y accordingly, 
run regression (37), and obtain the residual sum of squares (RSS) for each transformed regression. 
Choose the value of A that gives the minimum RSS. 2 


1 G.E.P. Box and D.R. Cox, "An Analysis of Transformations," journal of the Royal Statistical Society, B26, 
1964, pp. 211-243. 

2 For an accessible discussion, refer to John Neter, Michael Kutner, Christopher Nachtsheim, and 
William Wasserman, Applied Linear Regression Models, 3rd ed., Richard D. Irwin, Chicago, 1996. 






Chapter 


Multiple Regression 
Analysis: The Problem 
of Estimation 


The two-variable model studied extensively in the previous chapters is often inadequate in 
practice. In our consumption-income example (Example 3.1), for instance, it was assumed 
implicitly that only income Xis related to consumption Y. But economic theory is seldom so 
simple for, besides income, a number of other variables are also likely to affect consump¬ 
tion expenditure. An obvious example is wealth of the consumer. As another example, the 
demand for a commodity is likely to depend not only on its own price but also on the prices 
of other competing or complementary goods, income of the consumer, social status, etc. 
Therefore, we need to extend our simple two-variable regression model to cover models 
involving more than two variables. Adding more variables leads us to the discussion of 
multiple regression models, that is, models in which the dependent variable, or regressand, 
Y depends on two or more explanatory variables, or regressors. 

The simplest possible multiple regression model is three-variable regression, with one 
dependent variable and two explanatory variables. In this and the next chapter we shall 
study this model. Throughout, we are concerned with multiple linear regression models, 
that is, models linear in the parameters; they may or may not be linear in the variables. 

7.1 The Three-Variable Model; Notation and Assumptions 

Generalizing the two-variable population regression function (PRF) Eq. (2.4.2), we may 
write the three-variable PRF as 

Yi = + PiXii + foXu +ut ( 7 . 1 . 1 ) 

where Y is the dependent variable, X2 and A3 the explanatory variables (or regressors), u the 
stochastic disturbance term, and i the z'th observation; in case the data are time series, the 
subscript t will denote the fth observation. 1 

ftor notational symmetry, Eq. (7.1.1) can also be written as 

Y^foXM+frXzi + foXa + Ui 
with the provision that Xy = 1 for all i 


188 
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In Eq. (7.1.1) ft is the intercept term. As usual, it gives the mean or average effect on Y 
of all the variables excluded from the model, although its mechanical interpretation is the 
average value of Y when X 2 and X 3 are set equal to zero. The coefficients ft and ft are 
called the partial regression coefficients, and their meaning will be explained shortly. 

We continue to operate within the framework of the classical linear regression model 
(CLRM) first introduced in Chapter 3. As a reminder, we assume the following: 


ASSUMPTIONS 1. Linear regression model, or linear in the parameters. (7.1.2) 

2. Fixed X values or X values independent of the error term. Here, this means 
we require zero covariance between u, and each X variables. 

cov (u,,X 2i ) = cov (u,,X 3i ) = 0 (7.1 3) 2 

3. Zero mean value of disturbance u,. 

E(ui | X 2i , X 3i ) = 0 for each i (7.1.4) 

4. Homoscedasticity or constant variance of u,. 

var (ui) = a 2 (7.1.5) 

5. No autocorrelation, or serial correlation, between the disturbances. 

cov (u„ Uj) = 0 i?j (7.1.6) 

6 . The number of observations n must be greater than the number of 

parameters to be estimated, which is 3 in our current case. (7.1.7) 

7. There must be variation in the values of the X variables. (7.1.8) 

We will also address two other requirements. 

8 . No exact collinearity between the X variables. 

No exact linear relationship between X 2 and X 3 (7.1.9) 

In Section 7.7, we will spend more time discussing the final assumption. 

9. There is no specification bias. 

The model is correctly specified. (7.1.10) 


The rationale for assumptions (7.1.2) through (7.1.10) is the same as that discussed in 
Section 3.2. Assumption (7.1.9), that there is no exact linear relationship between X 2 and 
X 3 , is technically known as the assumption of no collinearity or no multicollinearity if 
more than one exact linear relationship is involved. 

Informally, no collinearity means none of the regressors can be written as exact linear 
combinations of the remaining regressors in the model. 

Formally, no collinearity means that there exists no set of numbers, ft and ft, not both 
zero such that 


X 2 X 2i + X 3 X 3i = 0 


( 7 . 1 . 11 ) 


2 This assumption is automatically fulfilled if X 2 and X3 are nonstochastic and Eq. (7.1.4) holds. 
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If such an exact linear relationship exists, then X 2 and X 3 are said to be collinear or 
linearly dependent. On the other hand, if Eq. (7.1.11) holds true only when X 2 — 
a 3 = 0, then X 2 and X 3 are said to be linearly independent. 

Thus, if 


X 2t = -4X 3i or X 2i + 4X 3i = 0 ( 7 . 1 . 12 ) 

the two variables are linearly dependent, and if both are included in a regression model, we 
will have perfect collinearity or an exact linear relationship between the two regressors. 

Although we shall consider the problem of multicollinearity in depth in Chapter 10, in¬ 
tuitively the logic behind the assumption of no multicollinearity is not too difficult to grasp. 
Suppose that in Eq. (7.1.1) Y, X 2 , and X 3 represent consumption expenditure, income, and 
wealth of the consumer, respectively. In postulating that consumption expenditure is lin¬ 
early related to income and wealth, economic theory presumes that wealth and income may 
have some independent influence on consumption. If not, there is no sense in including 
both income and wealth variables in the model. In the extreme, if there is an exact linear re¬ 
lationship between income and wealth, we have only one independent variable, not two, 
and there is no way to assess the separate influence of income and wealth on consumption. 
To see this clearly, let X 3i = 2X 2l in the consumption-income-wealth regression. Then the 
regression (7.1.1) becomes 

Yi=P\ + p 2 X 2i + p 3 (2X 2i ) + Ui 
= Pi + (P 2 + 2p 3 )X 2i + u, ( 7 . 1 . 13 ) 

= P\ + olX 2 i + u i 

where a = (p 2 + 2p 3 ). That is, we in fact have a two-variable and not a three-variable 
regression. Moreover, if we run the regression (7.1.13) and obtain a, there is no way to 
estimate the separate influence of X 2 (= p 2 ) and X 3 ( = p 3 ) on 7, for a gives the combined 
influence of X 2 and X 3 on 7. 3 

In short, the assumption of no multicollinearity requires that in the PRF we include only 
those variables that are not exact linear functions of one or more variables in the model. 
Although we will discuss this topic more fully in Chapter 10, a couple of points may be 
noted here. 

First, the assumption of no multicollinearity pertains to our theoretical (i.e., PRF) 
model. In practice, when we collect data for empirical analysis there is no guarantee that 
there will not be correlations among the regressors. As a matter of fact, in most applied 
work it is almost impossible to find two or more (economic) variables that may not be 
correlated to some extent, as we will show in our illustrative examples later in the chapter. 
What we require is that there be no exact linear relationships among the regressors, as in 
Eq. (7.1.12). 

Second, keep in mind that we are talking only about perfect linear relationships between 
two or more variables. Multicollinearity does not rule out nonlinear relationships between 
variables. Suppose X 3i = X\ t . This does not violate the assumption of no perfect collinearity, 
as the relationship between the variables here is nonlinear. 


Mathematically speaking, a = (ft + 2ft) is one equation in two unknowns and there is no unique 
way of estimating ft and ft from the estimated a. 
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7.2 Interpretation of Multiple Regression Equation 

Given the assumptions of the classical regression model, it follows that, on taking the con¬ 
ditional expectation of Y on both sides of Eq. (7.1.1), we obtain 

E(Yt | X-u, X 3i ) = (J, + hXn + fyXv ( 7 . 2 . 1 ) 

In words, Eq. (7.2.1) gives the conditional mean or expected value of Y conditional upon 
the given or fixed values of X 2 and X 3 . Therefore, as in the two-variable case, multiple 
regression analysis is regression analysis conditional upon the fixed values of the regres¬ 
sors, and what we obtain is the average or mean value of Y or the mean response of Y for 
the given values of the regressors. 


7.3 The Meaning of Partial Regression Coefficients 


As mentioned earlier, the regression coefficients p 2 and fi 3 are known as partial regression 
or partial slope coefficients. The meaning of partial regression coefficient is as follows: f} 2 
measures the change in the mean value of Y, E(Y), per unit change in X2, holding the value 
of X 3 constant. Put differently, it gives the “direct” or the “net” effect of a unit change in 
X2 on the mean value of Y, net of any effect that X 3 may have on mean Y. Likewise, () 3 
measures the change in the mean value of Y per unit change in X 3 , holding the value of X2 
constant. 4 That is, it gives the “direct” or “net” effect of a unit change in X 3 on the mean 
value of Y, net of any effect that X2 may have on mean Y. 5 

How do we actually go about holding the influence of a regressor constant? To explain 
this, let us revert to our child mortality example (Example 6.6). Recall that in that example, 
Y = child mortality (CM), X 2 = per capita GNP (PGNP), and X 3 = female literacy rate 
(FLR). Let us suppose we want to hold the influence of FLR constant. Since FLR may 
have some effect on CM as well as PGNP in any given concrete data, what we can do is 
remove the (linear) influence of FLR from both CM and PGNP by running the regression of 
CM on FLR and of PGNP on FLR separately and then looking at the residuals obtained from 
these regressions. Using the data given in Table 6.4, we obtain the following regressions: 

CM, = 263.8635 - 2.3905 FLR, + u u 

, ( 7 . 3 . 1 ) 

se = (12.2249) (0.2133) r 2 = 0.6695 


where u u represents the residual term of this regression. 

PGNP, = -39.3033 + 28.1427 FLR, + u 2i 

se = (734.9526) (12.8211) r 2 = 0.0721 


( 7 . 3 . 2 ) 


where u 2i represents the residual term of this regression. 


4 The calculus-minded reader will notice at once that fl 2 and fl 3 are the partial derivatives of 
E(Y | X 2/ X 3 ) with respect to X 2 and X 3 . 

incidentally, the terms holding constant, controlling for, allowing or accounting for the influence of, 
correcting the influence of, and sweeping out the influence of are synonymous and will be used 
interchangeably in this text. 
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Now 


u u m (CM, - 263.8635 + 2.3905 FLR,) ( 7 . 3 . 3 ) 


represents that part of CM left after removing from it the (linear) influence of FLR. Likewise, 
u 2i = (PGNP, + 39.3033 - 28.1427 FLR,) (7.3.4) 


represents that part of PGNP left after removing from it the (linear) influence of FLR. 

Therefore, if we now regress u\, on uu, which are “purified” of the (linear) influence of 
FLR, wouldn’t we obtain the net effect of PGNP on CM? That is indeed the case (see 
Appendix 7A, Section 7A.2). The regression results are as follows: 


u\i = — 0 . 0056 m 2 ; 
se = (0.0019) r 2 = 0.1152 


( 7 . 3 . 5 ) 


Note: This regression has no intercept term because the mean value of the OLS residuals 
Uu and «2; is zero. (Why?) 

The slope coefficient of—0.0056 now gives the “true” or net effect of a unit change in 
PGNP on CM or the true slope of CM with respect to PGNP. That is, it gives the partial 
regression coefficient of CM with respect to PGNP, fn- 

Readers who want to get the partial regression coefficient of CM with respect to FLR 
can replicate the above procedure by first regressing CM on PGNP and getting the residu¬ 
als from this regression (mi,), then regressing FLR on PGNP and obtaining the residuals 
from this regression (u2i), and then regressing u\, on uji. I am sure readers get the idea. 

Do we have to go through this multistep procedure every time we want to find out the 
true partial regression coefficient? Fortunately, we do not have to do that, for the same job 
can be accomplished fairly quickly and routinely by the OLS procedure discussed in the 
next section. The multistep procedure just outlined is merely for pedagogic purposes to 
drive home the meaning of “partial” regression coefficient. 


7.4 OLS and ML Estimation of the Partial 
Regression Coefficients 

To estimate the parameters of the three-variable regression model (7.1.1), we first consider 
the method of ordinary least squares (OLS) introduced in Chapter 3 and then consider 
briefly the method of maximum likelihood (ML) discussed in Chapter 4. 

OLS Estimators 

To find the OLS estimators, let us first write the sample regression function (SRF) corre¬ 
sponding to the PRF of Eq. (7.1.1) as follows: 

ft = h + to + to + Ui ( 7 . 4 . 1 ) 

where «, is the residual term, the sample counterpart of the stochastic disturbance 
term w,. 
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As noted in Chapter 3, the OLS procedure consists of choosing the values of the 
unknown parameters so that the residual sum of squares (RSS) E u] is as small as possi¬ 
ble. Symbolically, 


min % = Yfri ~ h ~ hXv ~ foXxf (7.4.2) 


where the expression for the RSS is obtained by simple algebraic manipulations of 
Eq. (7.4.1). 

The most straightforward procedure to obtain the estimators that will minimize 
Eq. (7.4.2) is to differentiate it with respect to the unknowns, set the resulting expressions 
to zero, and solve them simultaneously. As shown in Appendix 7A, Section 7A.1, this pro¬ 
cedure gives the following normal equations [cf. Eqs. (3.1.4) and (3.1.5)]: 


Y = fr+P 2 I 2 + &I 3 (7.4.3) 

E Y ‘ X s = Pi J2 *2i + PiJ2 X l + & E (7.4.4) 

E y <*3 i = Pi E +P 2 J 2 x * x *+& E x 3, (7-4 ‘ 5) 


From Eq. (7.4.3) we see at once that 

Pi = Y- frX 2 - foX 3 (7.4.6) 

which is the OLS estimator of the population intercept fi\. 

Following the convention of letting the lowercase letters denote deviations from sample 
mean values, one can derive the following formulas from the normal equations (7.4.3) 
to (7.4.5): 


(EAX 2 0(E4)-(Ev-Av)(Ex2,X3,) 

(E4HE4MEW 

(Eva3,)(E4)-(Eva2,)(E™,) 

(StSStMEW 


(7.4.7) 6 

(7.4.8) 


which give the OLS estimators of the population partial regression coefficients fi 2 and fi 3 , 
respectively. 

In passing, note the following: (1) Equations (7.4.7) and (7.4.8) are symmetrical in na¬ 
ture because one can be obtained from the other by interchanging the roles of X 2 and X 3 ; 
(2) the denominators of these two equations are identical; and (3) the three-variable case is 
a natural extension of the two-variable case. 


6 This estimator is equal to that of Eq. (7.3.5), as shown in App. 7A, Sec. 7A.2. 
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Variances and Standard Errors of OLS Estimators 

Having obtained the OLS estimators of the partial regression coefficients, we can derive 
the variances and standard errors of these estimators in the manner indicated in Appen¬ 
dix 3A.3. As in the two-variable case, we need the standard errors for two main purposes: 
to establish confidence intervals and to test statistical hypotheses. The relevant formulas are 
as follows: 7 


var(A) = - 




E4E4-(E») 




se(j6i) = -K/var (ft) 


E4 


(E4)(E4)-(EW 


( 7 . 4 . 9 ) 

( 7 . 4 . 10 ) 

( 7 . 4 . 11 ) 


or, equivalently, 


var (ft) = 


E4( 1 -'2j) 


(7.4.12) 


where r 2 3 is the sample coefficient of correlation between X 2 and X 3 as defined in Chapter 3. 8 


se (fc) = +y var (/J 2 ) 


(E4XE4) - (E*2,*3i) 


( 7 . 4 . 13 ) 

( 7 . 4 . 14 ) 


or, equivalently, 


w '“E4(i-4) 

se(ft) = +y var (ft) 

,3 3 . -riio 2 


( 7 . 4 . 15 ) 

( 7 . 4 . 16 ) 

( 7 . 4 . 17 ) 


In all these formulas a 2 is the (homoscedastic) variance of the population disturbances u,. 

Following the argument of Appendix 3A, Section 3A.5, the reader can verify that an 
unbiased estimator of a 2 is given by 


E«f 


n- 3 


( 7 . 4 . 18 ) 


7 The derivations of these formulas are easier using matrix notation. Advanced readers may refer to 

Appendix C. 

8 Using the definition of r given in Chapter 3, we have 


( I >/* 3/) 2 

E 4 E 4 
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Note the similarity between this estimator of a 2 and its two-variable counterpart 
[<r 2 = (J] u 2 )/(n - 2)]. The degrees of freedom are now (n - 3) because in estimating 
J2 it 2 we must first estimate ft, ft, and ft, which consume 3 df. (The argument is quite 
general. Thus, in the four-variable case the df will be n — 4.) 

The estimator rr 2 can be computed from Eq. (7.4.18) once the residuals are available, 
but it can also be obtained more readily by using the following relation (for proof, see 
Appendix 7A, Section 7A.3): 

£«? = - A £ *** ( 7 - 4 - 19 > 


which is the three-variable counterpart of the relation given in Eq. (3.3.6). 


Properties of OLS Estimators 

The properties of OLS estimators of the multiple regression model parallel those of the 
two-variable model. Specifically: 

1. The three-variable regression line (surface) passes through the means Y, X 2 , and 
X 3 , which is evident fromEq. (7.4.3) (cf. Eq. [3.1.7] of the two-variable model). This prop¬ 
erty holds generally. Thus in the ^-variable linear regression model (a regressand and 
[k - 1] regressors) 

Y, = ft + ftA 2i + ftX 3i + • • • + ft 2ft + Ui (7.4.20) 

we have 

ft = Y - ftA 2 — ft A3-ft A* (7.4.21 ) 

2. The mean value of the estimated Y t (= %) is equal to the mean value of the actual 
Y t , which is easy to prove: 

Y, = ft + ftA 2i + ftX 3i 

= (Y- p 2 X 2 - ftX 3 ) + p 2 X 2i + ft A3; (Why?) 

= Y + p 2 (X 2l - X 2 ) + ft (A 3 ; - X 3 ) (7.4.22) 

= Y + f} 2 X 2 i + ftx 3 ; 

where as usual small letters indicate values of the variables as deviations from their 
respective means. 

Summing both sides_of Eq. (7.4.22) over the sample values and dividing through by 
the sample size n gives Y = Y. (Note: J2 x 2i = J2 *3; = 0. Why?) Notice that by virtue of 
Eq. (7.4.22) we can write 

9i = ft*2i + ftx 3i - (7.4.23) 

where y; = (% — Y). 

Therefore, the SRF (7.4.1) can be expressed in the deviation form as 

Yi = pi + Ui = ftx 2 ; + ftx 3 ; + Ui (7.4.24) 

3. — u — 0, which can be verified from Eq. (7.4.24). (Hint: Sum both sides 
ofEq. [7.4.24] over the sample values.) 

4. Theresidualsw, are uncorrelated withX 2 ; andA 3 ,, thatis,Xft;A2; ='£ J UiX 3i =0 
(see Appendix 7A. 1 for proof). 
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5. The residuals u, are uncorrelated with %; that is, Y m, f, = 0. Why? (Hint: Multiply 
Eq. [7.4.23] on both sides by u, and sum over the sample values.) 

6. From Eqs. (7.4.12) and (7.4.15) it is evident that as r 2 3, the correlation coefficient 
between X 2 and X 3 , increases toward 1, the variances of ft and ft increase for given val¬ 
ues of a 2 and Y x ii or x h ■ the limit, when r 2 3 = 1 (i.e., perfect collinearity), these 
variances become infinite. The implications of this will be explored fully in Chapter 10, but 
intuitively the reader can see that as r 2 3 increases it is going to be increasingly difficult to 
know what the true values of ft and ft are. (More on this in the next chapter, but refer to 
Eq. [7.1.13].) 

7. It is also clear from Eqs. (7.4.12) and (7.4.15) that for given values of r 2 3 and Y x 2 , 
or Y Xy, the variances of the OLS estimators are directly proportional to a 2 : that is, they 
increase as a 2 increases. Similarly, for given values of a 2 and r 2 3, the variance of ft is 
inversely proportional to Y x ii 1 that is, the greater the variation in the sample values of X 2 , 
the smaller the variance of ft and therefore ft can be estimated more precisely. A similar 
statement can be made about the variance of ft. 

8. Given the assumptions of the classical linear regression model, which are spelled 
out in Section 7.1, one can prove that the OLS estimators of the partial regression coeffi¬ 
cients not only are linear and unbiased but also have minimum variance in the class of all 
linear unbiased estimators. In short, they are BLUE. Put differently, they satisfy 
the Gauss-Markov theorem. (The proof parallels the two-variable case proved in Appen¬ 
dix 3A, Section 3A.6 and will be presented more compactly using matrix notation in 
Appendix C.) 

Maximum Likelihood Estimators 

We noted in Chapter 4 that under the assumption that m,, the population disturbances, are 
normally distributed with zero mean and constant variance a 2 , the maximum likelihood 
(ML) estimators and the OLS estimators of the regression coefficients of the two-variable 
model are identical. This equality extends to models containing any number of variables. 
(For proof, see Appendix 7A, Section 7A.4.) However, this is not true of the estimator 
of a 1 . It can be shown that the ML estimator of a 2 is Y “ V n regardless of the number of 
variables in the model, whereas the OLS estimator of a 2 is Y tf/(n — 2) in the two- 
variable case, Y “;/(« - 3) in the three-variable case, and Y tf/(n ~ k) in the case of the 
^-variable model (7.4.20). In short, the OLS estimator of a 2 takes into account the number 
of degrees of freedom, whereas the ML estimator does not. Of course, if n is very large, the 
ML and OLS estimators of a 2 will tend to be close to each other. (Why?) 


7.5 The Multiple Coefficient of Determination R 2 
and the Multiple Coefficient of Correlation R 

In the two-variable case we saw that r 1 as defined in Eq. (3.5.5) measures the goodness of 
fit of the regression equation; that is, it gives the proportion or percentage of the total vari¬ 
ation in the dependent variable Y explained by the (single) explanatory variable X. This 
notation of r 2 can be easily extended to regression models containing more than two vari¬ 
ables. Thus, in the three-variable model we would like to know the proportion of the varia¬ 
tion in Y explained by the variables X 2 and X 3 jointly. The quantity that gives this 
information is known as the multiple coefficient of determination and is denoted by R 2 ; 
conceptually it is akin to r 2 . 
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To derive R 2 , we may follow the derivation of r 2 given in Section 3.5. Recall that 


ft *= ft + ftX 2; - + fcX 3i + vti 

= % + Ui 


(7.5.1) 


where % is the estimated value of 7, from the fitted regression line and is an estimator of 
true E(Yj \X 2i ,X 3i ). Upon shifting to lowercase letters to indicate deviations from the 
mean values, Eq. (7.5.1) may be written as 


y, = PlX 2i + ftx 3i + Ui 
= yi + Ui 


(7.5.2) 


Squaring Eq. (7.5.2) on both sides and summing over the sample values, we obtain 


E*? = E#+E*?+ 2 Em 

= E# + E^ (Why?) 


(7.5.3) 


Verbally, Eq. (7.5.3) states that the total sum of squares (TSS) equals the explained sum of 
squares (ESS) plus the residual sum of squares (RSS). Now substituting for J2 u 2 from 
Eq. (7.4.19), we obtain 

E ^ 2 = E y? + E^ 2 ~^E y * x * - a E 

which, on rearranging, gives 

ESS = E # = h E V'U, + ft E ( 7 - 5 - 4 ) 


Now, by definition 


R 2 


ESS 

TSS 

hYsViXii + hUyiXv 

Et, 2 


(7.5.5) 9 


(cf. Eq. [7.5.5] with Eq. [3.5.6]). 

Since the quantities entering Eq. (7.5.5) are generally computed routinely, R 2 can be 
computed easily. Note that R 2 , like r 2 , lies between 0 and 1. If it is 1, the fitted regression 
line explains 100 percent of the variation in Y. On the other hand, if it is 0, the model does 
not explain any of the variation in Y. Typically, however, R 2 lies between these extreme val¬ 
ues. The fit of the model is said to be “better” the closer R 2 is to 1. 


9 Note that R 2 


also be computed as follows: 

2 RSS_EG? ft-3)a 2 
TSS-' Etf- 1 (n — 1)5 2 
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Recall that in the two-variable case we defined the quantity r as the coefficient of correla¬ 
tion and indicated that it measures the degree of (linear) association between two variables. 
The three-or-more-variable analogue of r is the coefficient of multiple correlation, denoted 
by R, and it is a measure of the degree of association between Y and all the explanatory vari¬ 
ables jointly. Although r can be positive or negative, R is always taken to be positive. In prac¬ 
tice, however, R is of little importance. The more meaningful quantity is R 2 . 

Before proceeding further, let us note the following relationship between R 2 and the 
variance of a partial regression coefficient in the ^-variable multiple regression model given 
in Eq. (7.4.20): 


var (Pj) 



(7.5.6) 


where jij is the partial regression coefficient of regressor Xj and R 2 is the R 2 in the 
regression of Xj on the remaining (k - 2) regressors. (Note: There are [k - 1] regressors 
in the k-variable regression model.) Although the utility of Eq. (7.5.6) will become appar¬ 
ent in Chapter 10 on multicollinearity, observe that this equation is simply an extension of 
the formula given in Eq. (7.4.12) or Eq. (7.4.15) for the three-variable regression model, 
one regressand and two regressors. 


7.6 An Illustrative Example 


EXAMPLE 7.1 

Child Mortality 
in Relation to 
per Capita GNP 
and Female 
Literacy Rate 


In Chapter 6 we considered the behavior of child mortality (CM) in relation to per capita 
GNP (PGNP). There we found that PGNP has a negative impact on CM, as one would 
expect. Now let us bring in female literacy as measured by the female literacy rate (FLR). 
A priori, we expect that FLR too will have a negative impact on CM. Now when we intro¬ 
duce both the variables in our model, we need to net out the influence of each of the 
regressors. That is, we need to estimate the (partial) regression coefficients of each regressor. 
Thus our model is: 


CM; = fa + fcPGNP; + ^3 FLR; + U ; (7.6.1) 

The necessary data are given in Table 6.4. Keep in mind that CM is the number of deaths 
of children under five per 1000 live births, PGNP is per capita GNP in 1980, and FLR is 
measured in percent. Our sample consists of 64 countries. 

Using the EViews6 statistical package, we obtained the following results: 

CM; =263.6416 - 0.0056 PGNP, - 2.2316 FLR, 

se= (11.5932) (0.0019) (0.2099) R 2 = 0.7077 

R 2 = 0.6981* 


where figures in parentheses are the estimated standard errors. Before we interpret this re¬ 
gression, observe the partial slope coefficient of PGNP, namely, —0.0056. Is it not precisely 
the same as that obtained from the three-step procedure discussed in the previous section 
(see Eq. [7.3.5])? But should that surprise you? Not only that, but the two standard errors 
are precisely the same, which is again unsurprising. But we did so without the three-step 
cumbersome procedure. 

*On this, see Section 7.8. 
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EXAMPLE 7.1 

( Continued) 

Let us now interpret these regression coefficients: -0.0056 is the partial regression 
coefficient of PGNP and tells us that with the influence of FLR held constant, as PGNP 
increases, say, by a dollar, on average, child mortality goes down by 0.0056 units. To make 
it more economically interpretable, if the per capita GNP goes up by a thousand dollars, on 
average, the number of deaths of children under age 5 goes down by about 5.6 per thou¬ 
sand live births. The coefficient —2.2316 tells us that holding the influence of PGNP 
constant, on average, the number of deaths of children under age 5 goes down by about 
2.23 per thousand live births as the female literacy rate increases by one percentage point. 
The intercept value of about 263, mechanically interpreted, means that if the values of 
PGNP and FLR rate were fixed at zero, the mean child mortality rate would be about 263 
deaths per thousand live births. Of course, such an interpretation should be taken with a 
grain of salt. All one could infer is that if the two regressors were fixed at zero, child mor¬ 
tality will be quite high, which makes practical sense. The R 2 value of about 0.71 means 
that about 71 percent of the variation in child mortality is explained by PGNP and FLR, a 
fairly high value considering that the maximum value of R 2 can at most be 1. All told, the 
regression results make sense. 

What about the statistical significance of the estimated coefficients? We will take this 
topic up in Chapter 8. As we will see there, in many ways this chapter will be an extension 
of Chapter 5, which dealt with the two-variable model. As we will also show, there are 
some important differences in statistical inference (i.e., hypothesis testing) between the 
two-variable and multivariable regression models. 


Regression on Standardized Variables 

In the preceding chapter we introduced the topic of regression on standardized variables 
and stated that the analysis can be extended to multivariable regressions. Recall that a vari¬ 
able is said to be standardized or in standard deviation units if it is expressed in terms of 
deviation from its mean and divided by its standard deviation. 

For our child mortality example, the results are as follows: 

CM* = - 0.2026 PGNP* - 0.7639 FLR* (7.6.3) 

se = (0.0713) (0.0713) r 2 = 0.7077 

Note: The starred variables are standardized variables. Also note that there is no intercept 
in the model for reasons already discussed in the previous chapter. 

As you can see from this regression, with FLR held constant, a standard deviation 
increase in PGNP leads, on average, to a 0.2026 standard deviation decrease in CM. Simi¬ 
larly, holding PGNP constant, a standard deviation increase in FLR, on average, leads to a 
0.7639 standard deviation decrease in CM. Relatively speaking, female literacy has more 
impact on child mortality than per capita GNP. Here you will see the advantage of using 
standardized variables, for standardization puts all variables on equal footing because all 
standardized variables have zero means and unit variances. 

Impact on the Dependent Variable of a Unit Change in More 
than One Regressor 

Before proceeding further, suppose we want to find out what would happen to the child 
mortality rate if we were to increase PGNP and FLR simultaneously. Suppose per capita 
GNP were to increase by a dollar and at the same time the female literacy rate were to go 
up by one percentage point. What would be the impact of this simultaneous change on the 
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child mortality rate? To find out, all we have to do is multiply the coefficients of PGNP and 
FLR by the proposed changes and add the resulting terms. In our example this gives us: 

-0.0056(1) - 2.2316(1) = 2.2372 

That is, as a result of this simultaneous change in PGNP and FLR, the number of deaths of 
children under age 5 would go down by about 2.24 deaths. 

More generally, if we want to find out the total impact on the dependent variable of a unit 
change in more than one regressor, all we have to do is multiply the coefficients of those re¬ 
gressors by the proposed changes and add up the products. Note that the intercept term 
does not enter into these calculations. (Why?) 


7.7 Simple Regression in the Context of Multiple Regression: 
Introduction to Specification Bias 

Recall that assumption (7.1.10) of the classical linear regression model states that the re¬ 
gression model used in the analysis is “correctly” specified; that is, there is no specifica¬ 
tion bias or specification error (see Chapter 3 for some introductory remarks). Although 
the topic of specification error will be discussed more fully in Chapter 13, the illustrative 
example given in the preceding section provides a splendid opportunity not only to drive 
home the importance of assumption (7.1.10) but also to shed additional light on the mean¬ 
ing of partial regression coefficient and provide a somewhat informal introduction to the 
topic of specification bias. 

Assume that Eq. (7.6.1) is the “true” model explaining the behavior of child mortality in 
relation to per capita GNP and female literacy rate (FLR). But suppose we disregard FLR 
and estimate the following simple regression: 

Y i =a l +a 2 X 2i +u u (7.7.1) 

where 7= CM and A 2 = PGNP. 

Since Eq. (7.6.1) is the true model, estimating Eq. (7.7.1) would constitute a specifica¬ 
tion error; the error here consists in omitting the variable X3, the female literacy rate. Notice 
that we are using different parameter symbols (the alphas) in Eq. (7.7.1) to distinguish them 
from the true parameters (the betas) given in Eq. (7.6.1). 

Now will «2 provide an unbiased estimate of the true impact of PGNP, which is given by 
Pi in model (7.6.1)? Will E{ a 2 ) = Pi, where «2 is the estimated value of a 2 ? In other 
words, will the coefficient of PGNP in Eq. (7.7.1) provide an unbiased estimate of the true 
impact of PGNP on CM, knowing that we have omitted the variable Xj, (FLR) from the 
model? As you would suspect, in general, a-i will not be an unbiased estimator of the true 
Pi- To give a glimpse of the bias, let us run the regression (7.7.1), which gave the follow¬ 
ing results. 

CM, = 157.4244 - 0.0114 PGNP, y 

se = (9.8455) (0.0032) ^ = 0.1662 

Observe several things about this regression compared to the “true” multiple regres¬ 
sion (7.6.1): 

1. In absolute terms (i.e., disregarding the sign), the PGNP coefficient has increased from 

0.0056 to 0.0114, almost a two-fold increase. 
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2. The standard errors are different. 

3. The intercept values are different. 

4. The r 2 values are dramatically different, although it is generally the case that, as the 
number of regressors in the model increases, the r 2 value increases. 

Now suppose that you regress child mortality on female literacy rate, disregarding the 
influence of PGNP. You will obtain the following results: 

CM, = 263.8635 - 2.3905 FLR, 

(7.7.3) 

se= (21.2249) (0.2133) r 2 = 0.6696 

Again if you compare the results of this (misspecified) regression with the “true” multi¬ 
ple regression, you will see that the results are different, although the difference here is not 
as noticeable as in the case of regression (7.7.2). 

The important point to note is that serious consequences can ensue if you misfit a model. 
We will look into this topic more thoroughly in Chapter 13, on specification errors. 

7.8 R 2 and the Adjusted R 2 

An important property of R 2 is that it is a nondecreasing function of the number of 
explanatory variables or regressors present in the model, unless the added variable is per¬ 
fectly collinear with the other regressors; as the number of regressors increases, R 2 almost 
invariably increases and never decreases. Stated differently, an additional X variable will 
not decrease R 2 . Compare, for instance, regression (7.7.2) or (7.7.3) with (7.6.2). To see 
this, recall the definition of the coefficient of determination: 


R 2 


ESS 
TSS 
_ RSS 
~~ TSS 



(7.8.1) 


Now yj is independent of the number of X variables in the model because it is simply 
— Y) 2 . The RSS, however, depends on the number of regressors present in the 
model. Intuitively, it is clear that as the number of X variables increases, uj is likely to 
decrease (at least it will not increase); hence R 2 as defined in Eq. (7.8.1) will increase. In 
view of this, in comparing two regression models with the same dependent variable but 
differing number of X variables, one should be very wary of choosing the model with the 
highest R 2 . 

To compare two R 2 terms, one must take into account the number of X variables present 
in the model. This can be done readily if we consider an alternative coefficient of determi¬ 
nation, which is as follows: 


R 2 = 1 — 


£*?/(»- 1 ) 


(7.8.2) 
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where k = the number of parameters in the model including the intercept term. (In the 
three-variable regression, k = 3. Why?) The R 2 thus defined is known as the adjusted R 2 , 
denoted by R 2 . The term adjusted means adjusted for the df associated with the sums of 
squares entering into Eq. (7.8.1): J2 u] has n - A: df in a model involving k parameters, 
which include the intercept term, and yf has n — 1 df. (Why?) For the three-variable 
case, we know that J2 u 2 has n - 3 df. 

Equation (7.8.2) can also be written as 



(7.8.3) 


where <r 2 is the residual variance, an unbiased estimator of true a 1 , and S\ is the sample 
variance of Y. 

It is easy to see that R 2 and R 2 are related because, substituting Eq. (7.8.1) into 
Eq. (7.8.2), we obtain 


R 2 m |-(1-7? 2 )^J- 
n — k 


(7.8.4) 


It is immediately apparent from Eq. (7.8.4) that (1) for k > 1, R 2 < R 2 which implies that 
as the number of X variables increases, the adjusted R 2 increases less than the unadjusted 
R 2 ; and (2) R 2 can be negative, although R 2 is necessarily nonnegative. 10 In case R 2 turns 
out to be negative in an application, its value is taken as zero. 

Which R 2 should one use in practice? As Theil notes: 

... it is good practice to use R 2 rather than R 2 because R 2 tends to give an overly optimistic picture 
of the fit of the regression, particularly when the number of explanatory variables is not very small 
compared with the number of observations. 11 

But Theil’s view is not uniformly shared, for he has offered no general theoretical justifica¬ 
tion for the “superiority” of R 2 . For example, Goldberger argues that the following R 2 , call 
it modified R 2 , will do just as well: 12 


Modified R 2 = (1 - k/n)R 2 


(7.8.5) 


His advice is to report R 2 , n, and k and let the reader decide how to adjust R 2 by allowing 
for n and k. 


10 Note, however, that if R 2 = 1, ft 2 = R 2 = 1. When R 2 = 0, R 2 = (1 - k)/{n- k), in which case R 2 
can be negative if k > 1. 

"Henri Theil, Introduction to Econometrics, Prentice Hall, Englewood Cliffs, NJ, 1978, p. 135. 

"Arthur S. Goldberger, A Course in Econometrics, Harvard University Press, Cambridge, Mass., 1991, 
p. 178. For a more critical view of R 2 , see S. Cameron, "Why Is the R Squared Adjusted Reported?" 
Journal of Quantitative Economics, vol. 9, no. 1, January 1993, pp. 183-186. He argues that "It [ft 2 ] is 
NOT a test statistic and there seems to be no clear intuitive justification for its use as a descriptive 
statistic. Finally, we should be clear that it is not an effective tool for the prevention of data mining" 

(p. 186). 
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Despite this advice, it is the adjusted R 2 , as given in Eq. (7.8.4), that is reported by most 
statistical packages along with the conventional R 2 . The reader is well advised to treat R 2 
as just another summary statistic. 

Incidentally, for the child mortality regression (7.6.2), the reader should verify that R 2 
is 0.6981, keeping in mind that in this example (n — 1) = 63 and (n — k) — 60. As ex¬ 
pected, R 2 of 0.6981 is less than R 2 of 0.7077. 

Besides R 2 and adjusted R 2 as goodness of fit measures, other criteria are often used to 
judge the adequacy of a regression model. Two of these are Akaike’s Information crite¬ 
rion and Amemiya’s Prediction criteria, which are used to select between competing 
models. We will discuss these criteria when we consider the problem of model selection in 
greater detail in a later chapter (see Chapter 13). 


Comparing Two R 2 Values 

It is crucial to note that in comparing two models on the basis of the coefficient of deter¬ 
mination, whether adjusted or not, the sample size n and the dependent variable must be the 
same; the explanatory variables may take any form. Thus for the models 


In Y i= p i + p 2 X 2i + hX v + u t (7.8.6) 

7' = «i + a 2 X 2i + a 2 X 2i + u t (7.8.7) 

the computed R 2 terms cannot be compared. The reason is as follows: By definition, 
R 2 measures the proportion of the variation in the dependent variable accounted for by the 
explanatory variable(s). Therefore, in Eq. (7.8.6) R 2 measures the proportion of the varia¬ 
tion in In Y explained by X 2 and A 3 , whereas in Eq. (7.8.7) it measures the proportion of the 
variation in Y and the two are not the same thing: As noted in Chapter 6, a change in In Y 
gives a relative or proportional change in 7, whereas a change in 7 gives an absolute 
change. Therefore, var 7/var 7, is not equal to var (In 7,)/var (In 7); that is, the two coef¬ 
ficients of determination are not the same. 13 

How then does one compare the R 2 ’s of two models when the regressand is not in the 
same form? To answer this question, let us first consider a numerical example. 


13 From the definition of ft 2 , 


for the linear model and 


know that 


1 - ft 2 


RSS 

TSS 


UYi-W" 


' £(ln Y, - In Y)2 


for the log model. Since the denominators on the right-hand sides of these expressions are different, 
we cannot compare the two ft 2 terms directly. 

As shown in Example 7.2, for the linear specification, the RSS = 0.1491 (the residual sum of 
squares of coffee consumption), and for the log-linear specification, the RSS = 0.0226 (the residual 
sum of squares of log of coffee consumption). These residuals are of different orders of magnitude 
and hence are not directly comparable. 
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EXAMPLE 7.2 

Coffee 

Consumption in 
the United States, 
1970-1980 


Consider the data in Table 7.1. The data pertain to consumption of cups of coffee per day 
( Y ) and real retail price of coffee (X) in the United States for years 1970-1980. Applying 
OLS to the data, we obtain the following regression results: 

Y t = 2.6911 - 0.4795X t 

se = (0.1216) (0.1140) RSS = 0.1491; r 2 = 0.6628 (7.8.8) 


The results make economic sense: As the price of coffee increases, on average, coffee con¬ 
sumption goes down by about half a cup per day. The r 2 value of about 0.66 means that 
the price of coffee explains about 66 percent of the variation in coffee consumption. The 
reader can readily verify that the slope coefficient is statistically significant. 

From the same data, the following double-log, or constant elasticity, model can be 
estimated: 


InVf = 0.7774 - 0.2530 In X t 
se = (0.0152) (0.0494) RSS = 0.0226; r 2 = 0.7448 


Since this is a double-log model, the slope coefficient gives a direct estimate of the price 
elasticity coefficient. In the present instance, it tells us that if the price of coffee per pound 
goes up by 1 percent, on average, per day coffee consumption goes down by about 
0.25 percent. Remember that in the linear model (7.8.8) the slope coefficient only gives 
the rate of change of coffee consumption with respect to price. (How will you estimate the 
price elasticity for the linear model?) The r 2 value of about 0.74 means that about 74 per¬ 
cent of the variation in the log of coffee demand is explained by the variation in the log of 
coffee price. 

Since the r 2 value of the linear model of 0.6628 is smaller than the r 2 value of 0.7448 
of the log-linear model, you might be tempted to choose the latter model because of its 


TABLE 7.1 

U.S. Coffee 
Consumption (T) in 
Relation to Average 
Real Retail Price 
(X),* 1970-1980 

Source: The data for Y are 
from Summary of National 
Coffee Drinking Study, Data 
Group, Elkins Park, Penn., 

prices) are from Nielsen Food 

I am indebted to Scott E. 
Sandberg for collecting the 


Year 

1970 

1971 

1972 

1973 

1974 

1975 

1976 

1977 

1978 

1979 

1980 


Y, 

Cups per Person X, 

per Day $ per lb 

2.57 0.77 

2.50 0.74 

2.35 0.72 

2.30 0.73 

2.25 0.76 

2.20 0.75 

2.11 1.08 

1.94 1.81 

1.97 1.39 

2.06 1.20 

2.02 1.17 


*Note: The i 


• Price Index (CPI) for food and beverages, 1967 = 100. 
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TABLE 7.2 
Raw Data for 
Comparing Two 
R 1 Values 
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high r 2 value. But for reasons already noted, we cannot do so. But if you do want to com¬ 
pare the two r 2 values, you may proceed as follows: 

1. Obtain In /(from Eq. (7.8.9) for each observation; that is, obtain the estimated log 
value of each observation from this model. Take the antilog of these values and then 
compute r 2 between these antilog values and actual Y t in the manner indicated by 
Eq. (3.5.14). This r 2 value is comparable to the r 2 value of the linear model (7.8.8). 

2. Alternatively, assuming all Y values are positive, take logarithms of the Y values, In Y. 
Obtain the estimated Y values, Y t , from the linear model (7.8.8), take the logarithms of 
these estimated Y values (i.e.. In Y t ), and compute the r 2 between (In Y t ) and (In Y t ) in 
the manner indicated in Eq. (3.5.14). This r 2 value is comparable to the r 2 value 
obtained from Eq. (7.8.9). 

For our coffee example, we present the necessary raw data to compute the comparable 
r 2 's in Table 7.2. To compare the r 2 value of the linear model (7.8.8) with that of (7.8.9), 
we first obtain log of (Y t ) (given in column [6] of Table 7.2), then we obtain the log of 
actual Y values (given in column [5] of the table), and then compute r 2 between these two 
sets of values using Eq. (3.5.14). The result is an r 2 value of 0.6779, which is now compa¬ 
rable with the r 2 value of the log-linear model of 0.7448. The difference between the two 
r 2 values is about 0.07. 

On the other hand, if we want to compare the r 2 value of the log-linear model with the 
linear model, we obtain lnY t for each observation from Eq. (7.8.9) (given in column [3] of 
the table), obtain their antilog values (given in column [4] of the table), and finally compute 
r 2 between these antilog values and the actual Y values, using formula (3.5.14). This will 
give an r 2 value of 0.7187, which is slightly higher than that obtained from the linear model 
(7.8.8), namely, 0.6628. 

Using either method, it seems that the log-linear model gives a slightly better fit. 



Y t 

Y t 

lnY t 

Antilog of 
lnY f 

In Y t 

1° (Yt) 

Year 

(1) 

(2) 

(3) 

(4) 

(5) 

(6) 

1970 

2.57 

2.321887 

0.843555 

2.324616 

0.943906 

0.842380 

1971 

2.50 

2.336272 

0.853611 

2.348111 

0.916291 

0.848557 

1972 

2.35 

2.345863 

0.860544 

2.364447 

0.854415 

0.852653 

1973 

2.30 

2.341068 

0.857054 

2.356209 

0.832909 

0.850607 

1974 

2.25 

2.326682 

0.846863 

2.332318 

0.810930 

0.844443 

1975 

2.20 

2.331477 

0.850214 

2.340149 

0.788457 

0.846502 

1976 

2.11 

2.173233 

0.757943 

2.133882 

0.746688 

0.776216 

1977 

1.94 

1.823176 

0.627279 

1.872508 

0.662688 

0.600580 

1978 

1.97 

2.024579 

0.694089 

2.001884 

0.678034 

0.705362 

1979 

2.06 

2.115689 

0.731282 

2.077742 

0.722706 

0.749381 

1980 

2.02 

2.1 30075 

0.737688 

2.091096 

0.703098 

0.756157 


Notes: Column (1): Actual Y values from Table 7.1. 

Column (2): Estimated 7 values from the linear model (7.8.8). 

Column (3): Estimated log Y values from the double-log model (7.8.9). 
Column (4): Antilog of values in column (3). 

Column (5): Log values of Fin column (1). 

Column (6): Log values of t, in column (2). 
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Allocating R 2 among Regressors 

Let us return to our child mortality example. We saw in Eq. (7.6.2) that the two regressors 
PGNP and FLR explain 0.7077 or 70.77 percent of the variation in child mortality. But now 
consider the regression (7.7.2) where we dropped the FLR variable and as a result the 
r 2 value dropped to 0.1662. Does that mean the difference in the r 2 value of 0.5415 
(0.7077 - 0.1662) is attributable to the dropped variable FLR? On the other hand, if you 
consider regression (7.7.3), where we dropped the PGNP variable, the r 2 value drops to 
0.6696. Does that mean the difference in the r 2 value of 0.0381 (0.7077 - 0.6696) is due 
to the omitted variable PGNP? 

The question then is: Can we allocate the multiple R 2 of 0.7077 between the two regres¬ 
sors, PGNP and FLR, in this manner? Unfortunately, we cannot do so, for the allocation 
depends on the order in which the regressors are introduced, as we just illustrated. Part of 
the problem here is that the two regressors are correlated, the correlation coefficient 
between the two being 0.2685 (verify it from the data given in Table 6.4). In most applied 
work with several regressors, correlation among them is a common problem. Of course, the 
problem will be very serious if there is perfect collinearity among the regressors. 

The best practical advice is that there is little point in trying to allocate the R 2 value to 
its constituent regressors. 

The "Game" of Maximizing R 2 

In concluding this section, a warning is in order: Sometimes researchers play the game of 
maximizing R 2 , that is, choosing the model that gives the highest R 2 . But this may be dan¬ 
gerous, for in regression analysis our objective is not to obtain a high R 2 per se but rather to 
obtain dependable estimates of the true population regression coefficients and draw statisti¬ 
cal inferences about them. In empirical analysis it is not unusual to obtain a very high R 2 but 
find that some of the regression coefficients either are statistically insignificant or have signs 
that are contrary to a priori expectations. Therefore, the researcher should be more con¬ 
cerned about the logical or theoretical relevance of the explanatory variables to the depen¬ 
dent variable and their statistical significance, ff in this process we obtain a high R 2 , well and 
good; on the other hand, if R 2 is low, it does not mean the model is necessarily bad. 14 
As a matter of fact, Goldberger is very critical about the role of R 2 . He has said: 

From our perspective, R 2 has a very modest role in regression analysis, being a measure of 
the goodness of fit of a sample LS [least-squares] linear regression in a body of data. Nothing 
in the CR [CLRM] model requires that R 2 be high. Hence a high R 2 is not evidence in favor of 
the model and a low R 2 is not evidence against it. 

In fact the most important thing about R 2 is that it is not important in the CR model. 

The CR model is concerned with parameters in a population, not with goodness of fit in the 


14 Some authors would like to deemphasize the use of R 2 as a measure of goodness of fit as well as its 
use for comparing two or more R 2 values. See Christopher H. Achen, Interpreting and Using 
Regression, Sage Publications, Beverly Hills, Calif., 1982, pp. 58-67, and C. Granger and P. Newbold, 
"R 2 and the Transformation of Regression Variables," journal of Econometrics, vol. 4,1976, pp. 205-210. 
Incidentally, the practice of choosing a model on the basis of highest R 2 , a kind of data mining, intro¬ 
duces what is known as pretest bias, which might destroy some of the properties of OLS estimators 
of the classical linear regression model. On this topic, the reader may want to consult George G. 
Judge, Carter R. Hill, William E. Griffiths, Helmut Liitkepohl, and Tsoung-Chao Lee, Introduction to the 
Theory and Practice of Econometrics, John Wiley, New York, 1982, Chapter 21. 
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sample.... If one insists on a measure of predictive success (or rather failure), then a * 1 2 3 might 
suffice: after all, the parameter er 2 is the expected squared forecast error that would result if 
the population CEF [PRF] were used as the predictor. Alternatively, the squared standard error 
of forecast... at relevant values of jc [regressors] may be informative . 15 


7.9 The Cobb-Douglas Production Function: 

More on Functional Form 

In Section 6.4 we showed how with appropriate transformations we can convert nonlinear 
relationships into linear ones so that we can work within the framework of the classical lin¬ 
ear regression model. The various transformations discussed there in the context of the 
two-variable case can be easily extended to multiple regression models. We demonstrate 
transformations in this section by taking up the multivariable extension of the two-variable 
log-linear model; others can be found in the exercises and in the illustrative examples 
discussed throughout the rest of this book. The specific example we discuss is the cele¬ 
brated Cobb-Douglas production function of production theory. 

The Cobb-Douglas production function, in its stochastic form, may be expressed as 

Y, - ft A %X%e u ‘ (7.9.1) 


where Y = output 

X 2 — labor input 
ft = capital input 
u — stochastic disturbance term 
e = base of natural logarithm 

From Eq. (7.9.1) it is clear that the relationship between output and the two inputs is 
nonlinear. However, if we log-transform this model, we obtain: 

In Y t = In ft + ft In ft, + ft lnft, + u, 

(7.9.2) 

= ft + ft in ft,- + ft In ft,- + Ui 

where ft = In ft. 

Thus written, the model is linear in the parameters ft, ft, and ft and is therefore a lin¬ 
ear regression model. Notice, though, it is nonlinear in the variables Y and Abut linear in 
the logs of these variables. In short, Eq. (7.9.2) is a log-log, double-log, or log-linear 
model, the multiple regression counterpart of the two-variable log-linear model (6.5.3). 

The properties of the Cobb-Douglas production function are quite well known: 

1. ft is the (partial) elasticity of output with respect to the labor input, that is, it measures 
the percentage change in output for, say, a 1 percent change in the labor input, holding the cap¬ 
ital input constant (see Exercise 7.9). 

2. Likewise, ft is the (partial) elasticity of output with respect to the capital input, hold¬ 
ing the labor input constant. 

3. The sum (ft + ft) gives information about the returns to scale, that is, the response 
of output to a proportionate change in the inputs. If this sum is 1, then there are constant 
returns to scale, that is, doubling the inputs will double the output, tripling the inputs will 


15 Arther S. Goldberger, op. cit., pp. 177-178. 
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EXAMPLE 7.3 

ValueAdded, 
Labor Hours, and 
Capital Input in 
the Manufactur¬ 
ing Sector 
TABLE 7.3 
Value Added, Labor 
Hours, and Capital 
Input in the Manu¬ 
facturing Sector of 
the U.S., 2005 

Source: 2005Annual Survey 
of Manufacturers, Sector 31: 
Supplemental Statistics 


triple the output, and so on. If the sum is less than 1, there are decreasing returns to scale — 
doubling the inputs will less than double the output. Finally, if the sum is greater than 1, 
there are increasing returns to scale —doubling the inputs will more than double the output. 

Before proceeding further, note that whenever you have a log-linear regression model 
involving any number of variables the coefficient of each of the X variables measures the 
(partial) elasticity of the dependent variable Y with respect to that variable. Thus, if you have 
a variable log-linear model: 

In Yi = po + p 2 \nX 2i + ft lnX 3i + • • • + fa \nX ki + u, (7.93) 

each of the (partial) regression coefficients, through fi k , is the (partial) elasticity of Y 
with respect to variables X 2 through X k . ]6 


To illustrate the Cobb-Douglas production function, we obtained the data shown in 
Table 7.3; these data are for the manufacturing sector of all 50 states and Washington, DC, 
for 2005. 

Assuming that the model (7.9.2) satisfies the assumptions of the classical linear regres¬ 
sion model, 17 we obtained the following regression by the OLS method (see Appendix 7A, 
Section 7A.5 for the computer printout): 


Area 

Output 

Value Added 
(thousands of $) 

Y 

Labor Input 
Worker Hrs 
(thousands) 

X2 

Capital Input 
Capital 
Expenditure 
(thousands of $) 
X3 

Alabama 

38,372,840 

424,471 

2,689,076 

Alaska 

1,805,427 

19,895 

57,997 

Arizona 

23,736,129 

206,893 

2,308,272 

Arkansas 

26,981,983 

304,055 

1,376,235 

California 

217,546,032 

1,809,756 

13,554,116 

Colorado 

19,462,751 

180,366 

1,790,751 

Connecticut 

28,972,772 

224,267 

1,210,229 

Delaware 

14,313,157 

54,455 

421,064 

District of Columbia 

159,921 

2,029 

7,188 

Florida 

47,289,846 

471,211 

2,761,281 

Georgia 

63,015,125 

659,379 

3,540,475 

Hawaii 

1,809,052 

1 7,528 

146,371 

Idaho 

10,511,786 

75,414 

848,220 

Illinois 

105,324,866 

963,156 

5,870,409 

Indiana 

90,120,459 

835,083 

5,832,503 

Iowa 

39,079,550 

336,159 

1,795,976 

Kansas 

22,826,760 

246,144 

1,595,118 

Kentucky 

38,686,340 

384,484 

2,503,693 

Louisiana 

69,910,555 

216,149 

4,726,625 


16 To see this, differentiate Eq. (7.9.3) partially with respect to the log of each X variable. Therefore, 

9 In Y/d In X 2 = ( dY/8X 2 )(X 2 /Y ) = fi 2 , which, by definition, is the elasticity of /with respect to X 2 
and 9 In Y/d In X3 = ( dY/3Xi)(X-}/Y ) = £3, which is the elasticity of Y with respect to X 3 , and so on. 
17 Notice that in the Cobb-Douglas production function (7.9.1) we have introduced the stochastic 
error term in a special way so that in the resulting logarithmic transformation it enters in the usual 
linear form. On this, see Section 6.9. 
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Maine 

7,856,947 

82,021 

415,131 

Maryland 

21,352,966 

1 74,855 

1,729,116 

Massachusetts 

46,044,292 

355,701 

2,706,065 

Michigan 

92,335,528 

943,298 

5,294,356 

Minnesota 

48,304,274 

456,553 

2,833,525 

Mississippi 

1 7,207,903 

267,806 

1,212,281 

Missouri 

47,340,157 

439,427 

2,404,122 

Montana 

2,644,567 

24,167 

334,008 

Nebraska 

14,650,080 

163,637 

627,806 

Nevada 

7,290,360 

59,737 

522,335 

New Hampshire 

9,188,322 

96,106 

507,488 

New Jersey 

51,298,516 

407,076 

3,295,056 

New Mexico 

20,401,410 

43,079 

404,749 

New York 

87,756,129 

727,1 77 

4,260,353 

North Carolina 

101,268,432 

820,013 

4,086,558 

North Dakota 

3,556,025 

34,723 

184,700 

Ohio 

124,986,166 

1,174,540 

6,301,421 

Oklahoma 

20,451,196 

201,284 

1,327,353 

Oregon 

34,808,109 

257,820 

1,456,683 

Pennsylvania 

104,858,322 

944,998 

5,896,392 

Rhode Island 

6,541,356 

68,987 

297,618 

South Carolina 

37,668,126 

400,317 

2,500,071 

South Dakota 

4,988,905 

56,524 

311,251 

Tennessee 

62,828,100 

582,241 

4,126,465 

Texas 

172,960,157 

1,120,382 

11,588,283 

Utah 

15,702,637 

150,030 

762,671 

Vermont 

5,418,786 

48,1 34 

276,293 

Virginia 

49,166,991 

425,346 

2,731,669 

Washington 

46,164,427 

313,279 

1,945,860 

West Virginia 

9,185,967 

89,639 

685,587 

Wisconsin 

66,964,978 

694,628 

3,902,823 

Wyoming 

2,979,475 

15,221 

361,536 


M'i = 3.8876 + 0.4683lnX 2 , + 0.5213lnX 3 , 

(0.3962) (0.0989) (0.0969) 

t= (9.8115) (4.7342) (5.3803) (7.9.4) 

R 2 =0.9642 df = 48 
R 2 = 0.9627 

From Eq. (7.9.4) we see that in the U.S. manufacturing sector for 2005, the output elas¬ 
ticities of labor and capital were 0.4683 and 0.5213, respectively. In other words, over the 
50 U.S. states and the District of Columbia, holding the capital input constant, a 1 percent 
increase in the labor input led on the average to about a 0.47 percent increase in the out¬ 
put. Similarly, holding the labor input constant, a 1 percent increase in the capital input 
led on the average to about a 0.52 percent increase in the output. Adding the two output 
elasticities, we obtain 0.99, which gives the value of the returns to scale parameter. As is 
evident, the manufacturing sector for the 50 United States and the District of Columbia 
was characterized by constant returns to scale. 

From a purely statistics viewpoint, the estimated regression line fits the data quite well. 
The R 2 value of 0.9642 means that about 96 percent of the variation in the (log of) output is 
explained by the (logs of) labor and capital. In Chapter 8, we shall see how the estimated 
standard errors can be used to test hypotheses about the "true" values of the parameters of 
the Cobb-Douglas production function for the U.S. manufacturing sector of the economy. 
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7.10 Polynomial Regression Models 

We now consider a class of multiple regression models, the polynomial regression 
models, that have found extensive use in econometric research relating to cost and produc¬ 
tion functions. In introducing these models, we further extend the range of models to which 
the classical linear regression model can easily be applied. 

To fix the ideas, consider Figure 7.1, which relates the short-run marginal cost (MC) of 
production (7) of a commodity to the level of its output (X). The visually-drawn MC curve 
in the figure, the textbook U-shaped curve, shows that the relationship between MC and 
output is nonlinear. If we were to quantify this relationship from the given scatterpoints, 
how would we go about it? In other words, what type of econometric model would capture 
first the declining and then the increasing nature of marginal cost? 

Geometrically, the MC curve depicted in Figure 7.1 represents a parabola. Mathemati¬ 
cally, the parabola is represented by the following equation: 

Y = p 0 + p x X + p 2 X 2 (7.10.1) 

which is called a quadratic function, or more generally, a second-degree polynomial in the 
variable X —the highest power of X represents the degree of the polynomial (if X 3 were 
added to the preceding function, it would be a third-degree polynomial, and so on). 

The stochastic version of Eq. (7.10.1) may be written as 

Yt = A, + PxXi + p 2 X] + Ui (7.10.2) 

which is called a second-degree polynomial regression. 

The general kth degree polynomial regression may be written as 

Yi = p 0 + PiXi + p 2 Xf + - • • + p k X k t + Ui (7.10.3) 

Notice that in these types of polynomial regressions there is only one explanatory variable 
on the right-hand side but it appears with various powers, thus making them multiple re¬ 
gression models. Incidentally, note that if X, is assumed to be fixed or nonstochastic, the 
powered terms ofX t also become fixed or nonstochastic. 

Do these models present any special estimation problems? Since the second-degree 
polynomial (7.10.2) or the Ath degree polynomial (7.10.13) is linear in the parameters, the 
P's, they can be estimated by the usual OLS or ML methodology. But what about the 


FIGURE 7.1 

The U-shaped 
marginal cost curve. 
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collinearity problem? Aren’t the various Jfs highly correlated since they are all powers of 
X? Yes, but remember that terms like X 2 , X 3 , X 4 , etc., are all nonlinear functions of X and 
hence, strictly speaking, do not violate the no multicollinearity assumption. In short, poly¬ 
nomial regression models can be estimated by the techniques presented in this chapter and 
present no new estimation problems. 


As an example of the polynomial regression, consider the data on output and total cost of 
production of a commodity in the short run given in Table 7.4. What type of regression 
model will fit these data? For this purpose, let us first draw the scattergram, which is 
shown in Figure 7.2. 

From this figure it is clear that the relationship between total cost and output resem¬ 
bles the elongated S curve; notice how the total cost curve first increases gradually and 
then rapidly, as predicted by the celebrated law of diminishing returns. This S shape of the 
total cost curve can be captured by the following cubic or third-degree polynomial: 

Y; = Po + Pt X; + p 2 X f + p 3 X, 3 + Ui ( 7 . 10 . 4 ) 

where Y = total cost and X = output. 

Given the data of Table 7.4, we can apply the OLS method to estimate the parameters 
of Eq. (7.10.4). But before we do that, let us find out what economic theory has to say 
about the short-run cubic cost function (7.10.4). Elementary price theory shows that in 
the short run the marginal cost (MC) and average cost (AC) curves of production are 
typically U-shaped—initially, as output increases both MC and AC decline, but after a 
certain level of output they both turn upward, again the consequence of the law of di¬ 
minishing return. This can be seen in Figure 7.3 (see also Figure 7.1). And since the MC 
and AC curves are derived from the total cost curve, the U-shaped nature of these curves 
puts some restrictions on the parameters of the total cost curve (7.10.4). As a matter of 


FIGURE 7.2 The total cost curve. 
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EXAMPLE 7.4 

(Continued) 


FIGURE 7.3 Short-run cost functions. 
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fact, it can be shown that the parameters of Eq. (7.10.4) must satisfy the following 
restrictions if one is to observe the typical U-shaped short-run marginal and average cost 
curves: 18 

1. y3 0 , /3i, and £3 > 0 

2. p 2 < 0 ( 7 . 10 . 5 ) 

3. 

All this theoretical discussion might seem a bit tedious. But this knowledge is extremely 
useful when we examine the empirical results, for if the empirical results do not agree with 
prior expectations, then, assuming we have not committed a specification error (i.e., cho¬ 
sen the wrong model), we will have to modify our theory or look for a new theory and 
start the empirical enquiry all over again. But as noted in the Introduction, this is the na¬ 
ture of any empirical investigation. 

Empirical Results. When the third-degree polynomial regression was fitted to the data 
of Table 7.4, we obtained the following results: 

?i = 141.7667 + 63.4776X,- 12.9615X? +0.9396X? 

(6.3753) (4.7786) (0.9857) (0.0591) R 2 = 0.9983 ( 7 . 10 . 6 ) 

(Note: The figures in parentheses are the estimated standard errors.) Although we will examine 
the statistical significance of these results in the next chapter, the reader can verify that they 
are in conformity with the theoretical expectations listed in Eq. (7.10.5). We leave it as an 
exercise for the reader to interpret the regression (7.10.6). 


18 See Alpha C. Chiang, Fundamental Methods of Mathematical Economics, 3d ed., McGraw-Hill, New 
York, 1984, pp. 250-252. 
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EXAMPLE 7.5 As an additional economic example of the polynomial regression model, consider the 
GDP Growth following regression results: 


Rate and Relative 
per Capita GDP 
for 2007 in 190 
Countries (in 
billions of 2000 
dollars) 

Source: World Bank World 
Development Indicators, 
adjusted to 2000 base and 

Economic Research Service. 


GDPG,= 5.5347 - 5.5788 RGDP + 2.8378 RGDP 2 

se = (0.2435) (1.5995) (1.4391) (7.10.7) 

R 2 = 0.1092 adj R 2 = 0.0996 

Where GDPG = GDP growth rate, percent in 2007, and RGDP = relative per capita GDP 
in 2007 (percentage of U.S. GDP per capita, 2007). The adjusted R 2 (adj R 2 ) tells us that 
after taking into account the number of regressors, the model explains only about 
9.96 percent of the variation in GDPG. Even the unadjusted R 2 of 0.1092 seems low. This 
might seem to be a disappointing value, but as we shall show in the next chapter, 
such low R 2 values are frequently encountered in cross-sectional data with a large number 
of observations. Besides, even an apparently low R 2 value can be statistically significant 
(i.e., different from zero), as we will show in the next chapter. 


7.11 Partial Correlation Coefficients 


Explanation of Simple and Partial Correlation Coefficients 

In Chapter 3 we introduced the coefficient of correlation rasa measure of the degree of 
linear association between two variables. For the three-variable regression model we can 
compute three correlation coefficients: r \ 2 (correlationbetween Y and X 2 ), r \ 2 (correlation 
coefficient between Y and X 3 ), and r 2 3 (correlation coefficient between X 2 and X 3 ); notice 
that we are letting the subscript 1 represent Y for notational convenience. These correlation 
coefficients are called gross or simple correlation coefficients, or correlation coefficients 
of zero order. These coefficients can be computed by the definition of correlation coeffi¬ 
cient given in Eq. (3.5.13). 

But now consider this question: Does, say, r\ 2 in fact measure the “true” degree of (lin¬ 
ear) association between Y and X 2 when a third variable X 3 may be associated with both of 
them? This question is analogous to the following question: Suppose the true regression 
model is (7.1.1) but we omit from the model the variable X 2 and simply regress Y on X 2 , 
obtaining the slope coefficient of, say, bi 2 . Will this coefficient be equal to the true coeffi¬ 
cient fi 2 if the model (7.1.1) were estimated to begin with? The answer should be apparent 
from our discussion in Section 7.7. In general, r i2 is not likely to reflect the true degree of 
association between Y and X 2 in the presence of X 2 . As a matter of fact, it is likely to give a 
false impression of the nature of association between Y and X 2 , as will be shown shortly. 
Therefore, what we need is a correlation coefficient that is independent of the influence, 
if any, of X 3 on X 2 and Y. Such a correlation coefficient can be obtained and is known 
appropriately as the partial correlation coefficient. Conceptually, it is similar to the partial 
regression coefficient. We define 

r\ 2.3 = partial correlation coefficient between Y and X 2 , holding^ constant 

r\ 3 2 = partial correlation coefficient between Y and A3, holding X 2 constant 

T23.1 = partial correlation coefficient between^ and A3, holding Y constant 


‘Optional. 
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These partial correlations can be easily obtained from the simple or zero-order, correlation 
coefficients as follows (for proofs, see the exercises): 19 


0 2.3 


03.2 


03.1 


02-0 303 

03 -O 2 O 3 

■J( l ~ r h ) 0 _ ^ 23 ) 


03 ~ O 2 O 3 
\/0 _ ^*12) (1 - ^13) 


(7.11.1) 

(7.11.2) 

(7.11.3) 


The partial correlations given in Eqs. (7.11.1) to (7.11.3) are called first-order correlation 
coefficients. By order we mean the number of secondary subscripts. Thus 02.34 would be 
the correlation coefficient of order two, r\ 2.34 5 would be the correlation coefficient of order 
three, and so on. As noted previously, r\ 2, r\ 3, and so on are called simple or zero-order 
correlations. The interpretation of, say, 7-12.34 is that it gives the coefficient of correlation 
between Y and X 2 , holding X3 and X4 constant. 

Interpretation of Simple and Partial 
Correlation Coefficients 

In the two-variable case, the simple r had a straightforward meaning: It measured the 
degree of (linear) association (and not causation) between the dependent variable Y and the 
single explanatory variable X. But once we go beyond the two-variable case, we need to 
pay careful attention to the interpretation of the simple correlation coefficient. From 
Eq. (7.11.1), for example, we observe the following: 

1. Even if n2 = 0, r\ 2.3 will not be zero unless r\ 3 or r 2 3 or both are zero. 

2. If r \2 = 0 and n 3 and r 2 3 are nonzero and are of the same sign, r, 2.3 will be negative, 
whereas if they are of the opposite signs, it will be positive. An example will make this 
point clear. Let Y = crop yield, X 2 = rainfall, and X 3 = temperature. Assume r 12 = 0, that 
is, no association between crop yield and rainfall. Assume further that r \3 is positive and 
r 2 3 is negative. Then, as Eq. (7.11.1) shows, r\ 2.3 will be positive; that is, holding tempera¬ 
ture constant, there is a positive association between yield and rainfall. This seemingly 
paradoxical result, however, is not surprising. Since temperature X 3 affects both yield 7and 
rainfall X 2 , in order to find out the net relationship between crop yield and rainfall, we need 
to remove the influence of the “nuisance” variable temperature. This example shows how 
one might be misled by the simple coefficient of correlation. 

3. The terms 7-12.3 and r\ 2 (and similar comparisons) need not have the same sign. 

4. In the two-variable case we have seen that r 2 lies between 0 and 1. The same property 
holds true of the squared partial correlation coefficients. Using this fact, the reader should 
verify that one can obtain the following expression from Eq. (7.11.1): 


0 < r\ 2 +r 2 3 +r| 3 - 2r 12 r 13 r23 < 1 (7.11.4) 


19 Most computer programs for multiple regression analysis routinely compute the simple correlation 
coefficients; hence the partial correlation coefficients can be readily computed. 
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which gives the interrelationships among the three zero-order correlation coefficients. Sim¬ 
ilar expressions can be derived from Eqs. (7.11.2) and (7.11.3). 

5. Suppose that r\ 3 = r 23 = 0. Does this mean that r\ 2 is also zero? The answer is 
obvious from Eq. (7.11.4). The fact that Y and X 3 and X 2 and A3 are uncorrelated does not 
mean that Y and X 2 are uncorrelated. 

In passing, note that the expression r 1 2 3 4 5 2 3 may be called the coefficient of partial 
determination and may be interpreted as the proportion of the variation in Y not explained 
by the variable X 3 that has been explained by the inclusion of X 2 into the model (see Exer¬ 
cise 7.5). Conceptually it is similar to R 2 . 

Before moving on, note the following relationships between R 2 , simple correlation co¬ 
efficients, and partial correlation coefficients: 


i + rf 3 -2r 12 ri 3 r 23 

(7.11.5) 

1 -''Is 


+ (i-4)4. 

(7.11.6) 

+ 0-4)4, 

(7.11.7) 


In concluding this section, consider the following: It was stated previously that R 2 will 
not decrease if an additional explanatory variable is introduced into the model, which can 
be seen clearly from Eq. (7.11.6). This equation states that the proportion of the variation in 
Y explained by X 2 and X 3 jointly is the sum of two parts: the part explained by X 2 alone 
(= r 2 2 ) and the part not explained by X 2 (= 1 — r 2 2 ) times the proportion that is explained 
by X 2 after holding the influence of X 2 constant. Now R 2 > r\ 2 so long as r 2 3 2 > 0. At 
worst, rf 3 2 will be zero, in which case R 2 = r 2 2 . 


1. This chapter introduced the simplest possible multiple linear regression model, namely, 
the three-variable regression model. It is understood that the term linear refers to 
linearity in the parameters and not necessarily in the variables. 

2. Although a three-variable regression model is in many ways an extension of the two- 
variable model, there are some new concepts involved, such as partial regression coeffi¬ 
cients, partial correlation coefficients, multiple correlation coefficient, adjusted and 
unadjusted (for degrees of freedom) R 2 , multicollinearity, and specification bias. 

3. This chapter also considered the functional form of the multiple regression model, such 
as the Cobb-Douglas production function and the polynomial regression model. 

4. Although R 2 and adjusted R 2 are overall measures of how the chosen model fits a given 
set of data, their importance should not be overplayed. What is critical is the underlying 
theoretical expectations about the model in terms of a priori signs of the coefficients 
of the variables entering the model and, as it is shown in the following chapter, their sta¬ 
tistical significance. 

5. The results presented in this chapter can be easily generalized to a multiple linear 
regression model involving any number of regressors. But the algebra becomes very 
tedious. This tedium can be avoided by resorting to matrix algebra. For the interested 
reader, the extension to the ^-variable regression model using matrix algebra is 
presented in Appendix C, which is optional. But the general reader can read the 
remainder of the text without knowing much of matrix algebra. 
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EXERCISES 

TABLE 7.5 


Questions 

7.1. Consider the data in Table 7.5. 


Y X 2 X 3 
1 1 2 
3 2 1 
8 3-3 


Based on these data, estimate the following regressions: 

Yi = a i + a 2 X 2i + u\i (1) 

Yi = X l +X 3 X 3i +u 2i (2) 

Yi = A + p 2 X 2i + p 3 X 3i + Ui (3) 


Note: Estimate only the coefficients and not the standard errors. 

a. Is a 2 = p 2 ‘! Why or why not? 

b. Is a 3 = /3 3 ? Why or why not? 

What important conclusion do you draw from this exercise? 

7.2. From the following data estimate the partial regression coefficients, their standard 
errors, and the adjusted and unadjusted R 2 values: 


Y = 367.693 

J2(Y t - Y) 2 = 66042.269 
^(X 3i - X 3 ) 2 = 280.000 
J2(Yi ~ Y)(X 3i - X 3 ) = 4250.900 


X 2 = 402.760 X 3 = 8.0 
J2( x 2i ~ X-Y) 2 = 84855.096 

~ Y)(X 2i - X 2 ) = 74778.346 
J2( x 2i - x i)(X 3i - X 3 ) = 4796.000 
n = 15 


7.3. Show that Eq. (7.4.7) can also be expressed as 

S _ ~ b 23 x 3i ) 

2 Efe - b 23 x 3i ) 2 
_ net (of x 3 ) covariation between y and x 2 

net (of x 3 ) variation in X2 

where b 23 is the slope coefficient in the regression of X 2 on X 3 . {Hint: Recall that 

b 2 3 = E x 2i x 3i / E X \i ■) 

7.4. In a multiple regression model you are told that the error term u, has the following 
probability distribution, namely, u, ~ N(0, 4). How would you set up & Monte Carlo 
experiment to verify that the true variance is in fact 4? 

7.5. Show that r 2 2 3 = (R 2 - rf 3 )/{ 1 - r\ 3 ) and interpret the equation. 

7.6. If the relation ci\X\ + a 2 X 2 + a 3 X 3 = 0 holds true for all values of X\, X 2 , and X 3 , 
find the values of the three partial correlation coefficients. 

7.7. Is it possible to obtain the following from a set of data? 

a. r 23 ■ 0.9, r 33 = —0.2, r 32 — 0.8 

b. r i2 = 0.6, r 23 = —0.9, r 3i = —0.5 

c. r 2 1 = 0.01,r 33 = 0.66, r 23 = —0.7 
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7.8. Consider the following model: 

Yt — ft + ft Education, + ft Years of experience + w, 

Suppose you leave out the years of experience variable. What kinds of problems or 
biases would you expect? Explain verbally. 

7.9. Show that ft and ft in Eq. (7.9.2) do, in fact, give output elasticities of labor and 
capital. (This question can be answered without using calculus; just recall the defin¬ 
ition of the elasticity coefficient and remember that a change in the logarithm of a 
variable is a relative change, assuming the changes are rather small.) 

7.10. Consider the three-variable linear regression model discussed in this chapter. 

a. Suppose you multiply all the X 2 values by 2. What will be the effect of this rescal¬ 
ing, if any, on the estimates of the parameters and their standard errors? 

b. Now instead of ( a ), suppose you multiply all the Y values by 2. What will be the 
effect of this, if any, on the estimated parameters and their standard errors? 

7.11. In general R 2 ^ r\ 2 + r 2 3 , but it is so only if 03 = 0. Comment and point out the 
significance of this finding. (Hint: See Eq. [7.11.5].) 

7.12. Consider the following models.* 

Model A: Yf — ctq + 012X21 T - o'3 X 2 t + u\t 
Model B: ( Y, - X 2t ) = ft + ftW 2 , + ftW 3 , + u 2t 

a. Will OLS estimates of ot\ and ft be the same? Why? 

b. Will OLS estimates of a 3 and ft be the same? Why? 

c. What is the relationship between a 2 and ft? 

d. Can you compare the R 2 terms of the two models? Why or why not? 

7.13. Suppose you estimate the consumption function^ 

Yi—a 1 + a 2 X t + uu 

and the savings function 

ft = ft + ft ft + u 2i 

where Y = consumption, Z = savings, X — income, and X = Y + Z, that is, 
income is equal to consumption plus savings. 

a. What is the relationship, if any, between a 2 and ft? Show your calculations. 

b. Will the residual sum of squares, RSS, be the same for the two models? Explain. 

c. Can you compare the R 2 terms of the two models? Why or why not? 

7.14. Suppose you express the Cobb-Douglas model given in Eq. (7.9.1) as follows: 

Y t m frXftxf'u, 

If you take the log-transform of this model, you will have In u l as the disturbance 
term on the right-hand side. 

a. What probabilistic assumptions do you have to make about In u, to be able to 
apply the classical normal linear regression model (CNLRM)? How would you 
test this with the data given in Table 7.3? 

b. Do the same assumptions apply to up. Why or why not? 

‘Adapted from Wojciech W. Charemza and Derek F. Deadman, Econometric Practice: General to Specific 
Modelling, Cointegration and Vector Autogression, Edward Elgar, Brookfield, Vermont, 1992, p. 18. 
Adapted from Peter Kennedy, A Guide to Econometrics, 3d ed.. The MIT Press, Cambridge, 
Massachusetts, 1992, p. 308, Question #9. 
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7.15. Regression through the origin. Consider the following regression through the origin: 
Yt = p 2 X 2i + fax* + in 

a. How would you go about estimating the unknowns? 

b. Will ffu, be zero for this model? Why or why not? 

c. Will Jp UiX 2i = 'JF), u i X3, = 0 for this model? 

d. When would you use such a model? 

e. Can you generalize your results to the ^-variable model? 

(Hint: Follow the discussion for the two-variable case given in Chapter 6.) 


Empirical Exercises 

7.16. The demand for roses* Table 7.6 gives quarterly data on these variables: 

Y — quantity of roses sold, dozens 
X 2 = average wholesale price of roses, $/dozen 
X3 = average wholesale price of carnations, $/dozen 
X 4 = average weekly family disposable income, $/week 

X 5 = the trend variable taking values of 1, 2, and so on, for the period 1971—III to 
1975-11 in the Detroit metropolitan area 
You are asked to consider the following demand functions: 

Y t — a 1 + a 2 X 2t + 0 : 3 X 3 1 + 0 ( 4 X 4 t + 0 ( 5 X 5 1 + u t 
In Y, = Pi+p 2 lnX 2r + p 3 lnX 3( + p 4 lnX 4 , + p 5 X 5t + u, 

a. Estimate the parameters of the linear model and interpret the results. 

b. Estimate the parameters of the log-linear model and interpret the results. 


TABLE 7.6 

Year and 






Quarterly Demand 
for Roses in Metro 

Quarter 

Y 

*2 

x> 

X 4 

*5 

Detroit Area, from 

1971-111 

11,484 

2.26 

3.49 

158.11 

1 

1971-III to 1975-11 

-IV 

9,348 

2.54 

2.85 

173.36 

2 


1972-1 

8,429 

3.07 

4.06 

165.26 

3 


-II 

10,079 

2.91 

3.64 

172.92 

4 


-III 

9,240 

2.73 

3.21 

1 78.46 

5 


-IV 

8,862 

2.77 

3.66 

198.62 

6 


197B-I 

6,216 

3.59 

3.76 

186.28 

7 


-II 

8,253 

3.23 

3.49 

188.98 

8 


-III 

8,038 

2.60 

3.13 

180.49 

9 


-IV 

7,476 

2.89 

3.20 

183.33 

10 


1974-1 

5,911 

3.77 

3.65 

181.87 

11 


-II 

7,950 

3.64 

3.60 

185.00 

12 


-III 

6,134 

2.82 

2.94 

184.00 

13 


-IV 

5,868 

2.96 

3.12 

188.20 

14 


1975-1 

3,160 

4.24 

3.58 

175.67 

15 


-II 

5,872 

3.69 

3.53 

188.00 

16 


*1 am indebted to Joe Walsh for collecting these data from a major wholesaler in the Detroit 
metropolitan area and subsequently processing them. 
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c. p 2 , p 3 , and p 4 give, respectively, the own-price, cross-price, and income elastici¬ 
ties of demand. What are their a priori signs? Do the results concur with the a 
priori expectations? 

d. How would you compute the own-price, cross-price, and income elasticities for 
the linear model? 

e. On the basis of your analysis, which model, if either, would you choose and why? 

7.17. Wildcat activity. Wildcats are wells drilled to find and produce oil and/or gas in an 
improved area or to find a new reservoir in a field previously found to be productive 
of oil or gas or to extend the limit of a known oil or gas reservoir. Table 7.7 gives data 
on these variables:* 

Y — the number of wildcats drilled 
X 2 = price at the wellhead in the previous period 
(in constant dollars, 1972 = 100) 

X 3 = domestic output 

X 4 = GNP constant dollars (1972 = 100) 

X 5 = trend variable, 1948 = 1, 1949 = 2,..., 1978 = 31 
See if the following model fits the data: 

Y t = P\ + P 2 X 2t + p 3 In X 3t + p 4 X 4t + p 3 X 31 + u t 

a. Can you offer an a priori rationale to this model? 

b. Assuming the model is acceptable, estimate the parameters of the model and their 
standard errors, and obtain R 2 and R 2 . 

c. Comment on your results in view of your prior expectations. 

d. What other specification would you suggest to explain wildcat activity? Why? 

7.18. US. defense budget outlays, 1962-1981. In order to explain the U.S. defense budget, 
you are asked to consider the following model: 

Y t = Pi+ p 2 X 2t + p 3 X 3t + p 4 X 4t + p 5 X 5t + u t 

where Y, — defense budget-outlay for year t, $ billions 
X 2t = GNP for year t, $ billions 
X 3t = U.S. military sales/assistance in year t, $ billions 
X 4l — aerospace industry sales, $ billions 

X$, = military conflicts involving more than 100,000 troops. This variable 
takes a value of 1 when 100,000 or more troops are involved but is 
equal to zero when that number is under 100,000. 

To test this model, you are given the data in Table 7.8. 

a. Estimate the parameters of this model and their standard errors and obtain R 2 , 
modified R 2 , and R 2 . 

b. Comment on the results, taking into account any prior expectations you have 
about the relationship between Y and the various X variables. 

c. What other variable(s) might you want to include in the model and why? 


*1 am indebted to Raymond Savino for collecting and processing these data. 
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TABLE 7.7 
Wildcat Activity 

Source: Energy Information 
Administration, 1978 Report I 


Thousands 

Per Barrel 
Price, 

Domestic 
Output 
(millions of 
barrels 

GNP, 

Constant 


of Wildcats, 

Constant $ 

per day) 

$ Billions 

Time 

(Y) 

(* 2 ) 

(Xs) 

(* 4 ) 

(*s) 

8.01 

4.89 

5.52 

487.67 

1948 = 1 

9.06 

4.83 

5.05 

490.59 

1949 = 2 

10.31 

4.68 

5.41 

533.55 

1950 = 3 

11.76 

4.42 

6.16 

576.57 

1951 =4 

12.43 

4.36 

6.26 

598.62 

1952 = 5 

13.31 

4.55 

6.34 

621.77 

1953 = 6 

13.10 

4.66 

6.81 

613.67 

1954 = 7 

14.94 

4.54 

7.15 

654.80 

1955 = 8 

16.17 

4.44 

7.17 

668.84 

1956 = 9 

14.71 

4.75 

6.71 

681.02 

1957 = 10 

13.20 

4.56 

7.05 

679.53 

1958 = 11 

13.19 

4.29 

7.04 

720.53 

1959 = 12 

11.70 

4.19 

7.18 

736.86 

1960 = 13 

10.99 

4.17 

7.33 

755.34 

1961 =14 

10.80 

4.11 

7.54 

799.15 

1962 = 15 

10.66 

4.04 

7.61 

830.70 

1963 = 16 

10.75 

3.96 

7.80 

874.29 

1964 = 17 

9.47 

3.85 

8.30 

925.86 

1965 = 18 

10.31 

3.75 

8.81 

980.98 

1966 = 19 

8.88 

3.69 

8.66 

1,007.72 

1967 = 20 

8.88 

3.56 

8.78 

1,051.83 

1968 = 21 

9.70 

3.56 

9.18 

1,078.76 

1969 = 22 

7.69 

3.48 

9.03 

1,075.31 

1970 = 23 

6.92 

3.53 

9.00 

1,107.48 

1971 =24 

7.54 

3.39 

8.78 

1,171.10 

1972 = 25 

7.47 

3.68 

8.38 

1,234.97 

1973 = 26 

8.63 

5.92 

8.01 

1,217.81 

1974 = 27 

9.21 

6.03 

7.78 

1,202.36 

1975 = 28 

9.23 

6.12 

7.88 

1,271.01 

1976 = 29 

9.96 

6.05 

7.88 

1,332.67 

1977 = 30 

10.78 

5.89 

8.67 

1,385.10 

1978 = 31 


7.19. The demand for chicken in the United States, 1960-1982. To study the per capita 
consumption of chicken in the United States, you are given the data in Table 7.9, 
where Y — per capita consumption of chickens, lb 
X2 — real disposable income per capita, $ 

X 3 = real retail price of chicken per lb, 0 
X\ = real retail price of pork per lb, ft 
X 5 = real retail price of beef per lb, ft 

X 6 = composite real price of chicken substitutes per lb, ft, which is a 
weighted average of the real retail prices per lb of pork and beef, the 
weights being the relative consumptions of beef and pork in total beef 
and pork consumption 
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TABLE 7.8 

U.S. Defense Budget 
Outlays, 1962-1981 


Defense 

Budget 


U.S. Military 
Sales/ 

Aerospace 

Industry 

Conflicts 


Outlays 

GNP 

Assistance 

Sales 

100,000+ 

Source: These data were 

Year 

W 

(X 2 ) 

(X 3 ) 

(X 4 ) 

(*s) 

collected by Albert Lucchino 

1962 

51.1 

560.3 

0.6 

16.0 

0 

publications. 

1963 

52.3 

590.5 

0.9 

16.4 

0 


1964 

53.6 

632.4 

1.1 

16.7 

0 


1965 

49.6 

684.9 

1.4 

17.0 

1 


1966 

56.8 

749.9 

1.6 

20.2 

1 


1967 

70.1 

793.9 

1.0 

23.4 

1 


1968 

80.5 

865.0 

0.8 

25.6 

1 


1969 

81.2 

931.4 

1.5 

24.6 

1 


1970 

80.3 

992.7 

1.0 

24.8 

1 


1971 

77.7 

1,077.6 

1.5 

21.7 

1 


1972 

78.3 

1,185.9 

2.95 

21.5 

1 


1973 

74.5 

1,326.4 

4.8 

24.3 

0 


1974 

77.8 

1,434.2 

10.3 

26.8 

0 


1975 

85.6 

1,549.2 

16.0 

29.5 

0 


1976 

89.4 

1,718.0 

14.7 

30.4 

0 


1977 

97.5 

1,918.3 

8.3 

33.3 

0 


1978 

105.2 

2,163.9 

11.0 

38.0 

0 


1979 

117.7 

2,41 7.8 

13.0 

46.2 

0 


1980 

135.9 

2,633.1 

15.3 

57.6 

0 


1981 

162.1 

2,937.7 

18.0 

68.9 

0 


TABLE 7.9 
Demand for Chicken 
in the U.S., 1960-1982 

Citibase and on X 2 through 
are from the U.S. Department of 
Agriculture. I am indebted to 
Robert J. Fisher for collecting 
the data and for the statistical 


Year 

Y 

x 2 

Xj 

x 4 

X 5 

x 6 

1960 

27.8 

397.5 

42.2 

50.7 

78.3 

65.8 

1961 

29.9 

413.3 

38.1 

52.0 

79.2 

66.9 

1962 

29.8 

439.2 

40.3 

54.0 

79.2 

67.8 

1963 

30.8 

459.7 

39.5 

55.3 

79.2 

69.6 

1964 

31.2 

492.9 

37.3 

54.7 

77.4 

68.7 

1965 

33.3 

528.6 

38.1 

63.7 

80.2 

73.6 

1966 

35.6 

560.3 

39.3 

69.8 

80.4 

76.3 

1967 

36.4 

624.6 

37.8 

65.9 

83.9 

77.2 

1968 

36.7 

666.4 

38.4 

64.5 

85.5 

78.1 

1969 

38.4 

71 7.8 

40.1 

70.0 

93.7 

84.7 

1970 

40.4 

768.2 

38.6 

73.2 

106.1 

93.3 

1971 

40.3 

843.3 

39.8 

67.8 

104.8 

89.7 

1972 

41.8 

911.6 

39.7 

79.1 

114.0 

100.7 

1973 

40.4 

931.1 

52.1 

95.4 

124.1 

113.5 

1974 

40.7 

1,021.5 

48.9 

94.2 

127.6 

115.3 

1975 

40.1 

1,165.9 

58.3 

123.5 

142.9 

136.7 

1976 

42.7 

1,349.6 

57.9 

129.9 

143.6 

139.2 

1977 

44.1 

1,449.4 

56.5 

117.6 

139.2 

132.0 

1978 

46.7 

1,575.5 

63.7 

130.9 

165.5 

132.1 

1979 

50.6 

1,759.1 

61.6 

129.8 

203.3 

154.4 

1980 

50.1 

1,994.2 

58.9 

128.0 

219.6 

174.9 

1981 

51.7 

2,258.1 

66.4 

141.0 

221.6 

180.8 

1982 

52.9 

2,478.7 

70.4 

168.2 

232.6 

189.4 
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Now consider the following demand functions: 

\nY t = a\ + 012^0 X 2t + a 3 \nX 3t + u t (1) 

In Yf = yi + Yi In X 2t + y 3 In X 3t + 3/4 In X<\ t + u t (2) 

In Y, = fa + fa lnX 2t + fa In X 3t + A 4 lnX 5< + u, (3) 

In Y t = 0\ + 62 htXiit + @ 3 In X 3t + 0 4 In X 4r + 65 In X 3t + u t (4) 

In Y t = p 1 + fa lnX 2l + fa In X 3t + fa ]nX 6t + u, (5) 


From microeconomic theory it is known that the demand for a commodity generally 
depends on the real income of the consumer, the real price of the commodity, and 
the real prices of competing or complementary commodities. In view of these 
considerations, answer the following questions. 

a. Which demand function among the ones given here would you choose, and why? 

b. How would you interpret the coefficients of In X 2t and In X 3l in these models? 

c. What is the difference between specifications (2) and (4)? 

d. What problems do you foresee if you adopt specification (4)? (Hint: Prices of 
both pork and beef are included along with the price of chicken.) 

e. Since specification (5) includes the composite price of beef and pork, would you 
prefer the demand function (5) to the function (4)? Why? 

f Are pork and/or beef competing or substitute products to chicken? How do you 
know? 

g. Assume function (5) is the “correct” demand function. Estimate the parameters of 
this model, obtain their standard errors, and R 2 , R 2 , and modified R 2 . Interpret 
your results. 

h. Now suppose you run the “incorrect” model (2). Assess the consequences of this 
mis-specification by considering the values of y 2 and y 3 in relation to fa and fa, 
respectively. (Hint: Pay attention to the discussion in Section 7.7.) 

7.20. In a study of turnover in the labor market, James F. Ragan, Jr., obtained the follow¬ 
ing results for the U.S. economy for the period of 1950-1 to 1979-IV* (Figures in the 
parentheses are the estimated t statistics.) 

lnT ( = 4.47 - 0.34 lnX 2t + 1.221nX 3t + 1.221nX 4( 

(4.28) (-5.31) (3.64) (3.10) 

+ 0.80 \nX 3t — 0.0055 X& R 2 — 0.5370 

(1.10) (-3.09) 

Note: We will discuss the t statistics in the next chapter. 

where Y = quit rate in manufacturing, defined as number of people leaving jobs 
voluntarily per 100 employees 

X 2 = an instrumental or proxy variable for adult male unemployment rate 
X 3 = percentage of employees younger than 25 

X4 = Nt-i/Nt-4 = ratio of manufacturing employment in quarter (t — 1) to that 
in quarter (t — 4) 

X 3 = percentage of women employees 
X 6 = time trend (1950-1 = 1) 


'Source: See Ragan's article, "Turnover in the Labor Market: A Study of Quit and Layoff Rates," 
Economic Review, Federal Reserve Bank of Kansas City, May 1981, pp. 1 3-22. 
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a. Interpret the foregoing results. 

b. Is the observed negative relationship between the logs of Y and X2 justifiable a 
priori? 

c. Why is the coefficient of In X3 positive? 

d. Since the trend coefficient is negative, there is a secular decline of what percent in 
the quit rate and why is there such a decline? 

e. Is the R 2 “too” low? 

f. Can you estimate the standard errors of the regression coefficients from the given 
data? Why or why not? 

7.21. Consider the following demand function for money in the United States for the 
period 1980-1998: 

M, = 

where M = real money demand, using the M2 definition of money 
7= real GDP 
r = interest rate 

To estimate the above demand for money function, you are given the data in 
Table 7.10. 

Note: To convert nominal quantities into real quantities, divide M and GDP by 
CPI. There is no need to divide the interest rate variable by CPI. Also, note that we 
have given two interest rates, a short-term rate as measured by the 3-month treasury 
bill rate and the long-term rate as measured by the yield on the 30-year treasury bond, 
as prior empirical studies have used both types of interest rates. 


TABLE 7.10 
Demand for Money 
in the United States, 
1980-1998 


President, 2000, Tables 


Observation GDP M2 


1980 

1981 

1982 

1983 

1984 

1985 

1986 

1987 

1988 

1989 

1990 

1991 

1992 

1993 

1994 

1995 

1996 

1997 

1998 


2795.6 1600.4 

3131.3 1756.1 

3259.2 1911.2 

3534.9 2127.8 

3932.7 2311.7 

4213.0 2497.4 

4452.9 2734.0 

4742.5 2832.8 

5108.3 2995.8 

5489.1 3159.9 

5803.2 3279.1 

5986.2 3379.8 

6318.9 3434.1 

6642.3 3487.5 

7054.3 3502.2 

7400.5 3649.3 

7813.2 3824.2 

8300.8 4046.7 

8759.9 4401.4 


CPI 

82.4 
90.9 

96.5 

99.6 

103.9 

107.6 

109.6 

113.6 
118.3 
124.0 

130.7 

136.2 

140.3 
144.5 
148.2 

152.4 

156.9 

160.5 
163.0 


LTRATE TBRATE 

11.27 11.506 

13.45 14.029 

12.76 10.686 

11.18 8.630 

12.41 9.580 

10.79 7.480 

7.78 5.980 

8.59 5.820 

8.96 6.690 

8.45 8.120 

8.61 7.510 

8.14 5.420 

7.67 3.450 

6.59 3.020 

7.37 4.290 

6.88 5.510 

6.71 5.020 

6.61 5.070 

5.58 4.810 


Notes: GDP: gross domestic product ($ billions). 

M 2 : M 2 money supply. 

CPI: Consumer Price Index (1982-1984 = 100). 
LTRATE: long-term interest rate (30-year Treasury bond). 
TBRATE: three-month Treasury bill rate (% per annum). 
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a. Given the data, estimate the above demand function. What are the income and 
interest rate elasticities of demand for money? 

b. Instead of estimating the above demand function, suppose you were to fit the 
function ( M/Y), —a\r“ 2 e Ut . How would you interpret the results? Show the 
necessary calculations. 

c. How will you decide which is a better specification? {Note: A formal statistical 
test will be given in Chapter 8.) 

7.22. Table 7.11 gives data for the manufacturing sector of the Greek economy for the 

period 1961-1987. 

a. See if the Cohb-Douglas production function fits the data given in the table and 
interpret the results. What general conclusion do you draw? 

b. Now consider the following model: 


Output/labor = A{KjL)^i 


where the regressand represents labor productivity and the regressor represents the 
capital labor ratio. What is the economic significance of such a relationship, if any? 
Estimate the parameters of this model and interpret your results. 


TABLE 7.11 
Greek Industrial 
Sector 


Observation 


Output* Capital Labor* 


Capital-to-Labor 

Ratio 


Source: I am indebted to 
George K. Zestos of 
Christopher Newport 
University, Virginia, for these 


1961 

1962 

1963 

1964 

1965 

1966 

1967 

1968 

1969 

1970 

1971 

1972 

1973 

1974 

1975 

1976 

1977 

1978 

1979 

1980 

1981 

1982 

1983 

1984 

1985 

1986 

1987 


35.858 59.600 637.0 

37.504 64.200 643.2 

40.378 68.800 651.0 

46.147 75.500 685.7 

51.047 84.400 710.7 

53.871 91.800 724.3 

56.834 99.900 735.2 

65.439 109.100 760.3 

74.939 120.700 777.6 

80.976 132.000 780.8 

90.802 146.600 825.8 

101.955 162.700 864.1 

114.367 180.600 894.2 

101.823 197.100 891.2 

107.572 209.600 887.5 

117.600 221.900 892.3 

123.224 232.500 930.1 

130.971 243.500 969.9 

138.842 257.700 1006.9 

135.486 274.400 1020.9 

133.441 289.500 1017.1 

130.388 301.900 1016.1 

130.615 314.900 1008.1 

132.244 327.700 985.1 

137.318 339.400 977.1 

137.468 349.492 1007.2 

135.750 358.231 1000.0 


0.0936 

0.0998 

0.1057 

0.1101 


0.1188 


0.1267 

0.1359 

0.1435 

0.1552 

0.1691 

0.1775 

0.1883 

0.2020 

0.2212 

0.2362 

0.2487 

0.2500 

0.2511 

0.2559 

0.2688 

0.2846 

0.2971 

0.3124 

0.3327 

0.3474 

0.3470 

0.3582 


^Thousands of workers pe 



ant 1970 prices. 
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7.23. Monte Carlo experiment: Consider the following model: 

*i=A+ft*a+&*» + «, 

You are told that ft = 262, ft = -0.006, ft = -2.4, a 2 = 42, andw,- ~ N{ 0,42). 
Generate 10 sets of 64 observations on u, from the given normal distribution and use 
the 64 observations given in Table 6.4, where Y — CM, X 2 — PGNP, and X 3 = FLR 
to generate 10 sets of the estimated /i coefficients (each set will have the three estimated 
parameters). Take the averages of each of the estimated fi coefficients and relate them to 
the true values of these coefficients given above. What overall conclusion do you draw? 

7.24. Table 7.12 gives data for real consumption expenditure, real income, real wealth, and 
real interest rates for the U.S. for the years 1947-2000. These data will be used again 
for Exercise 8.35. 

a. Given the data in the table, estimate the linear consumption function using income, 
wealth, and interest rate. What is the fitted equation? 

b. What do the estimated coefficients indicate about the variables’ relationships to 
consumption expenditure? 


TABLE 7.12 
Real Consumption 
Expenditure, Real 
Income, Real Wealth, 
and Real Interest 
Rates for the U.S., 
1947-2000 

Sources: C, Yd, and quarterly 
and annual chain-type price 
indexes (1996 = 100): Bureau 
of Economic Analysis, U.S. 
Department of Commerce 
(http://www.bea.doc.gov/bea/ 
dnl.htm). 

Nominal annual yield on 
3-month Treasury securities: 
Economic Report of the 
President, 2002. 

Nominal wealth = end-of- 

households and nonprofits 
(from Federal Reserve flow 
of funds data: http://www. 
federalreserve.gov). 


Year C 

1947 976.4 

1948 998.1 

1949 1025.3 

1950 1090.9 

1951 1107.1 

1952 1142.4 

1953 1197.2 

1954 1221.9 

1955 1310.4 

1956 1348.8 

1957 1381.8 

1958 1393.0 

1959 1470.7 

1960 1510.8 

1961 1541.2 

1962 1617.3 

1963 1684.0 

1964 1784.8 

1965 1897.6 

1966 2006.1 

1967 2066.2 

1968 2184.2 

1969 2264.8 

1970 2314.5 

1971 2405.2 

1972 2550.5 

1973 2675.9 

1974 2653.7 

1975 2710.9 

1976 2868.9 


Yd 

1035.2 
1090.0 

1095.6 

1192.7 
1227.0 

1266.8 

1327.5 
1344.0 
1433.8 

1502.3 

1539.5 

1553.7 

1623.8 

1664.8 
1720.0 

1803.5 

1871.5 

2006.9 
2131.0 

2244.6 

2340.5 

2448.2 

2524.3 
2630.0 

2745.3 

2874.3 

3072.3 

3051.9 

3108.5 

3243.5 


Wealth 

5166.8 

5280.8 

5607.4 

5759.5 
6086.1 

6243.9 

6355.6 
6797.0 
71 72.2 

7375.2 

7315.3 
7870.0 
8188.1 

8351.8 

8971.9 
9091.5 
9436.1 

10003.4 

10562.8 
10522.0 
11312.1 

12145.4 

11672.3 
11650.0 

12312.9 
1 3499.9 
13081.0 
11868.8 

12634.4 
13456.8 


Interest Rate 

-10.351 


1.044 

0.407 

-5.283 

-0.277 

0.561 

-0.138 

0.262 

-0.736 

-0.261 

-0.575 

2.296 
1.511 

1.296 
1.396 
2.058 
2.027 
2.112 
2.020 
1.213 
1.055 
1.732 
1.166 


-0.156 

1.414 

-1.043 

-3.534 

-0.657 


Continued 
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TABLE 7.12 

(Continued) 


Year C 

1977 2992.1 

1978 3124.7 

1979 3203.2 

1980 3193.0 

1981 3236.0 

1982 3275.5 

1983 3454.3 

1984 3640.6 

1985 3820.9 

1986 3981.2 

1987 4113.4 

1988 4279.5 

1989 4393.7 

1990 4474.5 

1991 4466.6 

1992 4594.5 

1993 4748.9 

1994 4928.1 

1995 5075.6 

1996 5237.5 

1997 5423.9 

1998 5683.7 

1999 5968.4 

2000 6257.8 


Yd Wealth 

3360.7 13786.3 

3527.5 14450.5 

3628.6 15340.0 

3658.0 15965.0 

3741.1 15965.0 

3791.7 16312.5 

3906.9 16944.8 

4207.6 17526.7 

4347.8 19068.3 

4486.6 20530.0 

4582.5 21235.7 

4784.1 22332.0 

4906.5 23659.8 

5014.2 23105.1 

5033.0 24050.2 

5189.3 24418.2 

5261.3 25092.3 

5397.2 25218.6 

5539.1 27439.7 

5677.7 29448.2 

5854.5 32664.1 

6168.6 35587.0 

6320.0 39591.3 

6539.2 38167.7 


Interest Rate 


-1.190 

0.113 

1.704 
2.298 

4.704 
4.449 
4.691 
5.848 
4.331 
3.768 
2.819 
3.287 
4.318 
3.595 
1.803 
1.007 
0.625 
2.206 
3.333 
3.083 
3.120 
3.584 
3.245 
3.576 


C = real consumption expenditures in billions of chained 1996 dollars. 

Yd = real personal disposable income in billions of chained 1996 dollars. 

Wealth = real wealth in billions of chained 1996 dollars. 

Interest = nominal annual yield on 3-month Treasury securities-inflation rate (measured by the annual % change in annual chained 
price index). 

The nominal real wealth variable was created using data from the Federal Reserve Board’s measure of end-of-year net worth for 
households and nonprofits in the flow of funds accounts. The price index used to convert this nominal wealth variable to a real wealth 
variable was the average of the chained price index from the 4th quarter of the current year and the 1st quarter of the subsequent year. 


7.25. Estimating Qualcomm stock prices. As an example of the polynomial regression, 
consider data on the weekly stock prices of Qualcomm, Inc., a digital wireless 
telecommunications designer and manufacturer over the time period of 1995 to 
2000. The full data can be found on the textbook’s website in Table 7.13. During 
the late 1990, technological stocks were particularly profitable, but what type of 
regression model will best fit these data? Figure 7.4 shows a basic plot of the data for 
those years. 

This plot does seem to resemble an elongated S curve; there seems to be a slight 
increase in the average stock price, but then the rate increases dramatically toward the 
far right side of the graph. As the demand for more specialized phones dramatically 
increased and the technology boom got under way, the stock price followed suit and 
increased at a much faster rate. 

a. Estimate a linear model to predict the closing stock price based on time. Does this 
model seem to fit the data well? 

b. Now estimate a squared model by using both time and time-squared. Is this a bet¬ 
ter fit than in (a)? 
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FIGURE 7.4 

Qualcomm stock 


Price 



Date 


c. Finally, fit the following cubic or third-degree polynomial: 

Yi=P o + faXi + foXf + foX] + Ui 

where Y = stock price and X = time. Which model seems to be the best estimator 
for the stock prices? 


Appendix 7 A 


7A.1 Derivation of OLS Estimators 

Given in Equations (7.4.3) to (7.4.5) 

Differentiating the equation 

= - A - hxv - PiXuf (7.4.2) 

partially with respect to the three unknowns and setting the resulting equations to zero, we obtain 

m 2 YiY t -k ~ fhXu ~ ftJTwX-1) = 0 

3pi 

=2 £(r, - h - hxn - hXi){-x 2i ) = o 

0P2 

=2j2(.Yi~h~ hXn - kXiiX-XM) = 0 


Simplifying these, we obtain Eqs. (7.4.3) to (7.4.5). 
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In passing, note that the three preceding equations can also be written as 

I>=° 

(Why?) 

EM*-® 

which show the properties of the least-squares fit, namely, that the residuals sum to zero and that they 
are uncorrelated with the explanatory variables X 2 and X^. 

Incidentally, notice that to obtain the OLS estimators of the ^-variable linear regression model 
(7.4.20) we proceed analogously. Thus, we first write 

E«? = E< r * - & - ftx 2i - fax*? 


Differentiating this expression partially with respect to each of the k unknowns, setting the resulting 
equations equal to zero, and rearranging, we obtain the following k normal equations in the k 
unknowns: 

E Y >=+ft E X2 --+ft + ■ ■ •+ft E x « 

E =ft E X2; +ft E -4+ft E X2 ‘' X3i +■ ■ ■+ft E x * x « 

E = ft E X3; +ft E X2iX3; +ft E x 3<+■ ■ ■+ft E X3iXw 

E^HlE x * ; +ft E x * x *+ ft E X3iX «+• • ■+ft E x » 


Or, switching to small letters, these equations can be expressed as 

E = ft E x 2i+ft E H —+ft E x%m 

E T‘ X 3i = ft E *2i*3» + ft E *3i "+-^ ft E ■* 3 ‘ x « 

E = ft E xiiXki +ft E****« H —+ft E •*« 


It should further be noted that the k- variable model also satisfies these equations: 

E- = ° 

E ft** = E ft X3 ‘ = ■ ■ ■ = E = 0 
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7A.2 Equality between the Coefficients of PGNP 
in Equations (7.3.5) and (7.6.2) 

Letting 7 — CM, X 2 = PGNP, and X 3 = FLR and using the deviation form, write 

y i = bi 3 x ii +uu (1) 

*2* -$23*4»*§-«2l ( 2 ) 

Now regress u \ on u 2 to obtain: 

a\ ~ — u J^ U2t _o.0056 (for our example) (3) 

U 2i 

Note that because the u’s are residuals, their mean values are zero. Using (1) and (2), we can write 
(3) as 


E(>V - b\ 3 x 3i )(x 2i - b 23 x 3i ) 

Eta 

Expand the preceding expression, and note that 




E x U x T,i 

E4 


*13 


E4 , 


Making these substitutions into (4), we get 


(4) 


(5) 

( 6 ) 


a fHH - €■■ wUM 

= —0.0056 (for our example) 

7A.3 Derivation of Equation (7.4.19) 


Recall that 


u i = Y i -p i - p 2 X 2i - 0 3 X 3i 

which can also be written as 

Hi = yt - fax 2i - p 3 x 3i 

where small letters, as usual, indicate deviations from mean values. 


= J2ui(yt -fox* -faxid 

=Z>* 


(7.4.7) 
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where use is made of the fact that &iXU = 0- (Why?) Also 

J2 = J2 yi{ii = ~ P lX2i ~ 

that is, 

^2 = ^y? ~ PiJ^y.xn - Pi J 2 yiX3i (7.4.19) 

which is the required result. 


7A.4 Maximum Likelihood Estimation 
of the Multiple Regression Model 


Extending the ideas introduced in Chapter 4, Appendix 4A, we can write the log-likelihood function 
for the ^-variable linear regression model (7.4.20) as 

in. = Incr 2 ln(2,) - A £ 

Differentiating this function partially with respect to Pi, p 2 , ■ ■ ■, Pk and a 2 , we obtain the following 


(K + 1) equations: 

^ = -- 2 J2 (y ~ A - - PkXu){~ 1) (1) 

dpi rr z 

^ £(I| - Pi - p 2 X 2i - P k X ki )(-X 2i ) (2) 

^ 4^4 - A - Ma- PkXki)(-X ki ) (K) 

4^4 =+ 2 ^ £ (r ' _ _ — Ax * ,)2 (^ + 1 ) 


Setting these equations equal to zero (the first-order condition for optimization) and letting 
Pi, P2, ■ ■ ■, Pk and cr 2 denote the ML estimators, we obtain, after simple algebraic manipulations, 


j2% m *&+foT, x * + --- + p*T l x * 

Y ‘ x *-hJ2 Xi ‘+^J2 x 2t+---+PkJ2 x *x» 

Y, y ‘ x k= h J2 Xki +& £ Xi ' Xk < +■•■+& D -4 


which are precisely the normal equations of the least-squares theory, as can be seen from Appen¬ 
dix 7A, Section 7A. 1. Therefore, the ML estimators, the P’s, are the same as the OLS estimators, the 
P’s, given previously. But as noted in Chapter 4, Appendix 4A, this equality is not accidental. 

Substituting the ML (= OLS) estimators into the (K + l)st equation just given, we obtain, after 
simplification, the ML estimator of a 2 as 


0 2 = l - A - - PkXkt) 2 


As noted in the text, this estimator differs from the OLS estimator a 2 =^,u 2 /(n — k). And since the 
latter is an unbiased estimator of cr 2 , this conclusion implies that the ML estimator cr 2 is a biased 
estimator. But, as can be readily verified, asymptotically, er 2 is unbiased too. 
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7A.5 EViews Output of the Cobb-Douglas Production 
Function in Equation (7.9.4) 


Dependent Variable: Y1 
Method: Least Squares 
Included observations: 51 


C 

Y2 

Y3 


Coefficient Std. Error 

3.887600 0.396228 

0.468332 0.098926 

0.521279 0.096887 


t-Statistic Prob. 

9.811514 0.0000 

4.734170 0.0000 

5.380274 0.0000 


R-squared 0.964175 

Adjusted R-squared 0.962683 

S.E. of regression 0.266752 

Sum squared resid. 3.415520 

Log likelihood -3.426721 

F-statistic 645.9311 

Prob. (F-statistic) 0.000000 


Mean dependent var. 16.94139 

S.D. dependent var. 1.380870 

Akaike info criterion 0.252028 

Schwarz criterion 0.365665 

Hannan-Quinn criterion 0.295452 

Durbin-Watson stat. 1.946387 


Covariance of Estimates 



C 


Y2 


Y3 



c 

0.156997 


0.010364 


-0.020014 


Y2 

0.010364 


0.009786 


-0.009205 


Y3 

-0.020014 


-0.009205 


0.009387 


Y 

X2 

X3 

Y1 

Y2 

Y3 

Y1HAT 

Y1 RESID 

38,372,840 

424,471 

2,689,076 

1 7.4629 

12.9586 

14.8047 

17.6739 

-0.2110 

1,805,427 

19,895 

57,997 

14.4063 

9.8982 

10.9681 

14.2407 

0.1656 

23,736,129 

206,893 

2,308,272 

16.9825 

12.2400 

14.6520 

17.2577 

-0.2752 

26,981,983 

304,055 

1,376,235 

17.1107 

12.6250 

14.1349 

17.1685 

-0.0578 

21 7,546,032 

1,809,756 

13,554,116 

19.1979 

14.4087 

16.4222 

19.1962 

0.0017 

19,462,751 

180,366 

1,790,751 

16.7840 

12.1027 

14.3981 

17.0612 

-0.2771 

28,972,772 

224,267 

1,210,229 

17.1819 

12.3206 

14.0063 

16.9589 

0.2229 

14,313,157 

54,455 

421,064 

16.4767 

10.9051 

12.9505 

15.7457 

0.7310 

159,921 

2,029 

7,188 

11.9824 

7.6153 

8.8802 

12.0831 

-0.1007 

47,289,846 

471,211 

2,761,281 

17.6718 

13.0631 

14.8312 

17.7366 

-0.0648 

63,015,125 

659,379 

3,540,475 

1 7.9589 

13.3991 

15.0798 

18.0236 

-0.0647 

1,809,052 

1 7,528 

146,371 

14.4083 

9.7716 

11.8939 

14.6640 

-0.2557 

10,511,786 

75,414 

848,220 

16.1680 

11.2307 

13.6509 

16.2632 

-0.0952 

105,324,866 

963,156 

5,870,409 

18.4726 

13.7780 

15.5854 

18.4646 

0.0079 

90,120,459 

835,083 

5,832,503 

18.3167 

13.6353 

15.5790 

18.3944 

-0.0778 

39,079,550 

336,159 

1,795,976 

1 7.4811 

12.7253 

14.4011 

17.3543 

0.1269 

22,826,760 

246,144 

1,595,118 

16.9434 

12.4137 

14.2825 

17.1465 

-0.2030 

38,686,340 

384,484 

2,503,693 

17.4710 

12.8597 

14.7333 

1 7.5903 

-0.1193 

69,910,555 

216,149 

4,726,625 

18.0627 

12.2837 

15.3687 

17.6519 

0.4109 

7,856,947 

82,021 

415,131 

15.8769 

11.3147 

12.9363 

15.9301 

-0.0532 

21,352,966 

174,855 

1,729,116 

16.8767 

12.0717 

14.3631 

1 7.0284 

-0.1517 

46,044,292 

355,701 

2,706,065 

17.6451 

12.7818 

14.8110 

17.5944 

0.0507 


( Continued ) 
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Y 

X2 

X3 

Y1 

Y2 

Y3 

Y1HAT 

Y1RESID 

92,335,528 

943,298 

5,294,356 

18.3409 

13.7571 

15.4822 

18.4010 

-0.0601 

48,304,274 

456,553 

2,833,525 

17.6930 

13.0315 

14.8570 

17.7353 

-0.0423 

1 7,207,903 

267,806 

1,212,281 

16.6609 

12.4980 

14.0080 

1 7.0429 

-0.3820 

47,340,157 

439,427 

2,404,122 

1 7.6729 

12.9932 

14.6927 

17.6317 

0.0411 

2,644,567 

24,167 

334,008 

14.7880 

10.0927 

12.7189 

15.2445 

-0.4564 

14,650,080 

163,637 

627,806 

16.5000 

12.0054 

13.3500 

16.4692 

0.0308 

7,290,360 

59,737 

522,335 

15.8021 

10.9977 

13.1661 

15.9014 

-0.0993 

9,188,322 

96,106 

507,488 

16.0334 

11.4732 

13.1372 

16.1090 

-0.0756 

51,298,516 

407,076 

3,295,056 

17.7532 

12.9168 

15.0079 

17.7603 

-0.0071 

20,401,410 

43,079 

404,749 

16.8311 

10.6708 

12.9110 

15.6153 

1.2158 

87,756,129 

727,177 

4,260,353 

18.2901 

13.4969 

15.2649 

18.1659 

0.1242 

101,268,432 

820,01 3 

4,086,558 

18.4333 

13.61 71 

15.2232 

18.2005 

0.2328 

3,556,025 

34,723 

184,700 

15.0842 

10.4552 

12.1265 

15.1054 

-0.0212 

124,986,166 

1,174,540 

6,301,421 

18.6437 

13.9764 

15.6563 

18.5945 

0.0492 

20,451,196 

201,284 

1,327,353 

16.8336 

12.2125 

14.0987 

16.9564 

-0.1229 

34,808,109 

257,820 

1,456,683 

1 7.3654 

12.4600 

14.1917 

17.1208 

0.2445 

104,858,322 

944,998 

5,896,392 

18.4681 

13.7589 

15.5899 

18.4580 

0.0101 

6,541,356 

68,987 

297,618 

15.6937 

11.1417 

12.6036 

15.6756 

0.0181 

37,668,126 

400,317 

2,500,071 

1 7.4443 

12.9000 

14.7318 

17.6085 

-0.1642 

4,988,905 

56,524 

311,251 

15.4227 

10.9424 

12.6484 

15.6056 

-0.1829 

62,828,100 

582,241 

4,126,465 

17.9559 

13.2746 

15.2329 

18.0451 

-0.0892 

172,960,157 

1,120,382 

11,588,283 

18.9686 

13.9292 

16.2655 

18.8899 

0.0786 

15,702,637 

150,030 

762,671 

16.5693 

11.9186 

1 3.5446 

16.5300 

0.0394 

5,418,786 

48,134 

276,293 

15.5054 

10.7817 

12.5292 

15.4683 

0.0371 

49,166,991 

425,346 

2,731,669 

17.7107 

12.9607 

14.8204 

17.6831 

0.0277 

46,164,427 

31 3,279 

1,945,860 

17.6477 

12.6548 

14.4812 

17.3630 

0.2847 

9,185,967 

89,639 

685,587 

16.0332 

11.4035 

1 3.4380 

16.2332 

-0.2000 

66,964,978 

694,628 

3,902,823 

18.0197 

13.4511 

15.1772 

18.0988 

-0.0791 

2,979,475 

15,221 

361,536 

14.9073 

9.6304 

12.7981 

15.0692 

-0.1620 




[ in Chapter 10. 




Chapter 


Multiple Regression 
Analysis: The Problem 
of Inference 


This chapter, a continuation of Chapter 5, extends the ideas of interval estimation and hypo¬ 
thesis testing developed there to models involving three or more variables. Although in 
many ways the concepts developed in Chapter 5 can be applied straightforwardly to the 
multiple regression model, a few additional features are unique to such models, and it is 
these features that will receive more attention in this chapter. 


8.1 The Normality Assumption Once Again 

We know by now that if our sole objective is point estimation of the parameters of the 
regression models, the method of ordinary least squares (OLS), which does not make any 
assumption about the probability distribution of the disturbances u t , will suffice. But if our 
objective is estimation as well as inference, then, as argued in Chapters 4 and 5, we need to 
assume that the u, follow some probability distribution. 

For reasons already clearly spelled out, we assumed that the m, follow the normal distri¬ 
bution with zero mean and constant variance cr 2 . We continue to make the same assump¬ 
tion for multiple regression models. With the normality assumption and following the 
discussion of Chapters 4 and 7, we find that the OLS estimators of the partial regression 
coefficients, which are identical with the maximum likelihood (ML) estimators, are best 
linear unbiased estimators (BLUE). 1 Moreover, the estimators ft, ft, and ft are them¬ 
selves normally distributed with means equal to true ft, ft, and ft and the variances given 
in Chapter 7. Furthermore, (n — 3 )a 2 /a 2 follows the x 2 distribution with n — 3 df, and the 
three OLS estimators are distributed independently of a 2 . The proofs follow the two- 
variable case discussed in Appendix 3 A, Section 3 A. As a result and following Chapter 5, 


'With the normality assumption, the OLS estimators ft, ft, and ft are minimum-variance estimators 
in the entire class of unbiased estimators, whether linear or not. In short, they are BUE (best unbiased 
estimators). See C. R. Rao, Linear Statistical Inference and Its Applications, John Wiley & Sons, New 
York, 1965, p. 258. 


233 




234 Part One Single-Equation Regression Models 


one can show that, upon replacing a 2 by its unbiased estimator a * 1 in the computation of the 
standard errors, each of the following variables 


|B|H 

(8.1 

11) 


'“•:se(li| 

se(fe 

(8.1 

1.2) 

_ h - fh 
se(ft) 

(8.1 

1.3) 


follows the t distribution with n — 3 df. 

Note that the df are now n — 3 because in computing J2 u 2 and hence a 2 we first need 
to estimate the three partial regression coefficients, which therefore put three restrictions 
on the residual sum of squares (RSS) (following this logic in the four-variable case there 
will be n — 4 df, and so on). Therefore, the t distribution can be used to establish confi¬ 
dence intervals as well as test statistical hypotheses about the true population partial re¬ 
gression coefficients. Similarly, the x 2 distribution can be used to test hypotheses about the 
true a 2 . To demonstrate the actual mechanics, we use the following illustrative example. 


EXAMPLE 8.1 In Chapter 7 we regressed child mortality (CM) on per capita GNP (PGNP) and the female 

Child Mortality literacy rate (FLR) for a sample of 64 countries. The regression results given in Eq. (7.6.2) 

j-, , are reproduced below with some additional information: 

Example r 

Revisited CM/ = 263.6416 - 0.0056 PGNP, - 2.2316 FLR, 

se= (11.5932) (0.0019) (0.2099) 

t= (22.7411) (-2.8187) (-10.6293) (8.1.4) 

p value = (0.0000)* (0.0065) (0.0000)* 


R 2 = 0.7077 R 2 = 0.6981 


where * denotes extremely low value. 

In Eq. (8.1.4) we have followed the format first introduced in Eq. (5.11.1), where the 
figures in the first set of parentheses are the estimated standard errors, those in the sec¬ 
ond set are the t values under the null hypothesis that the relevant population coefficient 
has a value of zero, and those in the third are the estimated p values. Also given are R 2 and 
adjusted R 2 values. We have already interpreted this regression in Example 7.1. 

What about the statistical significance of the observed results? Consider, for example, 
the coefficient of PGNP of -0.0056. Is this coefficient statistically significant, that is, 
statistically different from zero? Likewise, is the coefficient of FLR of -2.2316 statistically 
significant? Are both coefficients statistically significant? To answer this and related ques¬ 
tions, let us first consider the kinds of hypothesis testing that one may encounter in the 
context of a multiple regression model. 


8.2 Hypothesis Testing in Multiple Regression: General Comments 

Once we go beyond the simple world of the two-variable linear regression model, hypoth¬ 
esis testing assumes several interesting forms, such as the following: 

1. Testing hypotheses about an individual partial regression coefficient (Section 8.3). 

2. Testing the overall significance of the estimated multiple regression model, that is, find¬ 
ing out if all the partial slope coefficients are simultaneously equal to zero (Section 8.4). 
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3. Testing that two or more coefficients are equal to one another (Section 8.5). 

4. Testing that the partial regression coefficients satisfy certain restrictions (Section 8.6). 

5. Testing the stability of the estimated regression model over time or in different cross- 
sectional units (Section 8.7). 

6. Testing the functional form of regression models (Section 8.8). 

Since testing of one or more of these types occurs so commonly in empirical analysis, we 
devote a section to each type. 


8.3 Hypothesis Testing about Individual Regression Coefficients 

If we invoke the assumption that w, ~ N(0, a 2 ), then, as noted in Section 8.1, we can use 
the t test to test a hypothesis about any individual partial regression coefficient. To illustrate 
the mechanics, consider the child mortality regression, Eq. (8.1.4). Let us postulate that 

Ho:0 2 = 0 and 77,: ft y 2 0 

The null hypothesis states that, with X 3 (female literacy rate) held constant, X 2 (PGNP) 
has no (linear) influence on Y (child mortality). 2 To test the null hypothesis, we use the t test 
given in Eq. (8.1.2). Following Chapter 5 (see Table 5.1), if the computed t value exceeds 
the critical t value at the chosen level of significance, we may reject the null hypothesis; 
otherwise, we may not reject it. For our illustrative example, using Eq. (8.1.2) and noting 
that ft = 0 under the null hypothesis, we obtain 


-0.0056 
1 ~ 0.0020 


-2.8187 


(8.3.1) 


as shown in Eq. (8.1.4). 

Notice that we have 64 observations. Therefore, the degrees of freedom in this example 
are 61 (why?). If you refer to the t table given in Appendix D, we do not have data corre¬ 
sponding to 61 df. The closest we have are for 60 df. If we use these df, and assume a, the 
level of significance (i.e., the probability of committing a Type I error) of 5 percent, the crit¬ 
ical t value is 2.0 for a two-tail test (look up t a /2 for 60 df) or 1.671 for a one-tail test (look 
up t a for 60 df). 

For our example, the alternative hypothesis is two-sided. Therefore, we use the two-tail 
t value. Since the computed t value of 2.8187 (in absolute terms) exceeds the critical t value 
of 2, we can reject the null hypothesis that PGNP has no effect on child mortality. To put it 
more positively, with the female literacy rate held constant, per capita GNP has a signifi¬ 
cant (negative) effect on child mortality, as one would expect a priori. Graphically, the sit¬ 
uation is as shown in Figure 8.1. 

In practice, one does not have to assume a particular value of a to conduct hypothesis 
testing. One can simply use the p value given in Eq. (8.1.4), which in the present case is 
0.0065. The interpretation of this p value (i.e., the exact level of significance) is that if the 
null hypothesis were true, the probability of obtaining a t value of as much as 2.8187 or 
greater (in absolute terms) is only 0.0065 or 0.65 percent, which is indeed a small proba¬ 
bility, much smaller than the artificially adopted value of a = 5%. 


2 ln most empirical investigations the null hypothesis is stated in this form, that is, taking the extreme 
position (a kind of straw man) that there is no relationship between the dependent variable and the 
explanatory variable under consideration. The idea here is to find out whether the relationship 
between the two is a trivial one to begin with. 
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This example provides us an opportunity to decide whether we want to use a one-tail 
or a two-tail t test. Since a priori child mortality and per capita GNP are expected to be 
negatively related (why?), we should use the one-tail test. That is, our null and alternative 
hypothesis should be: 

H 0 : f} 2 <0 and H\ : f$ 2 > 0 

As the reader knows by now, we can reject the null hypothesis on the basis of the one-tail 
t test in the present instance. If we can reject the null hypothesis in a two-sided test, we will 
have enough evidence to reject in the one-sided scenario as long as the statistic is in the 
same direction as the test. 

In Chapter 5 we saw the intimate connection between hypothesis testing and confidence 
interval estimation. For our example, the 95 percent confidence interval for is: 

Pi - 4/2 se (Js 2 ) < < P 2 + 4/2 se (/J 2 ) 

which in our example becomes 

-0.0056 - 2(0.0020) < ft < -0.0056 + 2(0.0020) 

that is, 

-0.0096 <p 2 < -0.0016 (8.3.2) 

that is, the interval, —0.0096 to —0.0016 includes the true fi 2 coefficient with 95 percent 
confidence coefficient. Thus, if 100 samples of size 64 are selected and 100 confidence in¬ 
tervals like Eq. (8.3.2) are constructed, we expect 95 of them to contain the true population 
parameter ft 2 . Since the interval (8.3.2) does not include the null-hypothesized value of 
zero, we can reject the null hypothesis that the true fS 2 is zero with 95 percent confidence. 

Thus, whether we use the t test of significance as in (8.3.1) or the confidence interval 
estimation as in (8.3.2), we reach the same conclusion. Flowever, this should not be 
surprising in view of the close connection between confidence interval estimation and 
hypothesis testing. 

Following the procedure just described, we can test hypotheses about the other parame¬ 
ters of our child mortality regression model. The necessary data are already provided in 
Eq. (8.1.4). For example, suppose we want to test the hypothesis that, with the influence of 
PGNP held constant, the female literacy rate has no effect whatsoever on child mortality. We 
can confidently reject this hypothesis, for under this null hypothesis the p value of obtaining 
an absolute t value of as much as 10.6 or greater is practically zero. 

Before moving on, remember that the /-testing procedure is based on the assumption 
that the error term u, follows the normal distribution. Although we cannot directly observe 
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FIGURE 8.2 

Histogram of 
residuals from 
regression (8.1.4). 


m 


-80 -40 0 40 


am 


Series: Residuals 


Minimum 
Std. dev. 
Skewness 


96.80276 

-84.26686 

41.07980 

0.227575 

2.948855 


Ui, we can observe their proxy, the u,, that is, the residuals. For our mortality regression, 
the histogram of the residuals is as shown in Figure 8.2. 

From the histogram it seems that the residuals are normally distributed. We can also 
compute the Jarque-Bera (JB) test of normality, as shown in Eq. (5.12.1). In our case the 
JB value is 0.5594 with a p value 0.76. 3 Therefore, it seems that the error term in our 
example follows the normal distribution. Of course, keep in mind that the JB test is a large- 
sample test and our sample of 64 observations may not be necessarily large. 

8.4 Testing the Overall Significance of the Sample Regression 

Throughout the previous section we were concerned with testing the significance of the 
estimated partial regression coefficients individually, that is, under the separate hypothesis 
that each true population partial regression coefficient was zero. But now consider the 
following hypothesis: 


Ho: = & = 0 (8.4.1) 

This null hypothesis is a joint hypothesis that ft and ft are jointly or simultaneously equal 
to zero. A test of such a hypothesis is called a test of the overall significance of the ob¬ 
served or estimated regression line, that is, whether Y is linearly related to both Xi and Xj. 

Can the joint hypothesis in Eq. (8.4.1) be tested by testing the significance of ft and ft 
individually as in Section 8.3? The answer is no, and the reasoning is as follows. 

In testing the individual significance of an observed partial regression coefficient in 
Section 8.3, we assumed implicitly that each test of significance was based on a different 
(i.e., independent) sample. Thus, in testing the significance of ft under the hypothesis that 
ft = 0, it was assumed tacitly that the testing was based on a different sample from the one 
used in testing the significance of ft under the null hypothesis that ft = 0. But to test the joint 
hypothesis of Eq. (8.4.1), if we use the same sample data, we shall be violating the 
assumption underlying the test procedure. 4 The matter can be put differently: In Eq. (8.3.2) 


3 For our example, the skewness value is 0.2276 and the kurtosis value is 2.9488. Recall that for a 
normally distributed variable the skewness and kurtosis values are, respectively, 0 and 3. 

4 ln any given sample the cov (ft, ft) may not be zero; that is, ft and ft may be correlated. See 
Eq. (7.4.1 7). 
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we established a 95 percent confidence interval for ft. But if we use the same sample data 
to establish a confidence interval for ft, say, with a confidence coefficient of 95 percent, we 
cannot assert that both ft and ft he in their respective confidence intervals with a proba¬ 
bility of (1 - a)(l - a) = (0.95)(0.95). 

In other words, although the statements 

Pr [ft - 4/2 se (ft) < ft < ft + 4/2 se (ft)] = 1 - a 
Pr [ft - 4/2 se (ft) < ft < ft + 4/2 se (ft)] = 1 - « 
are individually true, it is not true that the probability that the intervals 
[ft ± 4/2 se (ft), ft ± 4/2 se (ft)] 

simultaneously include ft and ft is (1 — a) 2 , because the intervals may not be indepen¬ 
dent when the same data are used to derive them. To state the matter differently, 

... testing a series of single [individual] hypotheses is not equivalent to testing those same 
hypotheses jointly. The intuitive reason for this is that in a joint test of several hypotheses any 
single hypothesis is “affected” by the information in the other hypotheses. 5 

The upshot of the preceding argument is that for a given example (sample) only one con¬ 
fidence interval or only one test of significance can be obtained. How, then, does one test 
the simultaneous null hypothesis that ft = ft = 0? The answer follows. 

The Analysis of Variance Approach to Testing the Overall 
Significance of an Observed Multiple Regression: The FTest 

For reasons just explained, we cannot use the usual t test to test the joint hypothesis that the 
true partial slope coefficients are zero simultaneously. However, this joint hypothesis can be 
tested by the analysis of variance (ANOVA) technique first introduced in Section 5.9, 
which can be demonstrated as follows. 

Recall the identity 

X! y i = P 2 J2 yt X2 > + fo yiX3i (8.4.2) 

TSS = ESS + RSS 

TSS has, as usual, n — 1 df and RSS has n — 3 df for reasons already discussed. ESS has 
2 df since it is a function of ft and ft. Therefore, following the ANOVA procedure dis¬ 
cussed in Section 5.9, we can set up Table 8.1. 

Now it can be shown 6 that, under the assumption of normal distribution for u, and the 
null hypothesis ft = ft = 0, the variable 

„ (ft E yixii + ft E T^3i)/ 2 Ess/df /0/1 „ 

= - Efi?/(»-3) - = RSS/df (8 - 43) 

is distributed as the F distribution with 2 and n — 3 df. 


5 Thomas B. Fomby, R. Carter Hill, and Stanley R. Johnson, Advanced Econometric Methods, Springer- 
Verlag, New York, 1984, p. 37. 

6 See K. A. Brownlee, Statistical Theory and Methodology in Science and Engineering, John Wiley & Sons, 
New York, 1960, pp. 278-280. 
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TABLE 8.1 

ANOVA Table for the 

Three-Variable 

Regression 


TABLE 8.2 
A Summary of the F 
Statistic 


Source of Variation 

SS 

df MSS 

Due to regression (ESS) 

ft E 7 * 2 / + ftE 7 * 3 / 

ft E 7 * 2 / + ft E 7 * 3 / 

Due to residual (RSS) 

Etf 

- *-d 

Total 

E yf 

n- 1 


What use can be made of the preceding F ratio? It can be proved 7 that under the as¬ 
sumption that the u t ~ N( 0, cr 2 ), 

E^\=E{a 2 ) = a 2 (8.4.4) 

n — 3 

With the additional assumption that ft = ft = 0, it can be shown that 

£(feE^2,+feE^3,) = ff2 (8 . 4 . 5) 

Therefore, if the null hypothesis is true, both Eqs. (8.4.4) and (8.4.5) give identical esti¬ 
mates of true cr 2 . This statement should not be surprising because if there is a trivial rela¬ 
tionship between 7 and X 2 and X 3 , the sole source of variation in 7 is due to the random 
forces represented by u ,. If, however, the null hypothesis is false, that is, X 2 and X 3 defi¬ 
nitely influence 7, the equality between Eqs. (8.4.4) and (8.4.5) will not hold. In this case, 
the ESS will be relatively larger than the RSS, taking due account of their respective df. 
Therefore, the F value of Eq. (8.4.3) provides a test of the null hypothesis that the true slope 
coefficients are simultaneously zero. If the F value computed from Eq. (8.4.3) exceeds the 
critical F value from the F table at the a percent level of significance, we reject Hf, other¬ 
wise we do not reject it. Alternatively, if the p value of the observed F is sufficiently low, 
we can reject H 0 . 

Table 8.2 summarizes the F test. Turning to our illustrative example, we obtain the 
ANOVA table, as shown in Table 8.3. 


Null Hypothesis 

Alternative Hypothesis 

Critical Region- 

Ho 

Hi 

Reject H 0 If 

S i 2 c 

°f = °2 

of > 

2 > r<*,ndf,ddf 
*5 

~2 — rr 2 
°1 — °2 

4*4 

H r 

> 'i//2,ndf,c/df 

S 2 

Or < F(1- a /2),ndf,ddf 


Notes: 



•Sfand S 2 are the two sample variances, 
wdf and ddi denote, respectively, the nun: 
In computing the F ratio, put the larger S 
The critical F values are given in the last 


lerator and denominator df. 

2 value in the numerator, 
column. The first subscript of F is 1 


6. Note that F (1 _ a/2) , nd f;c/df= 


[ of significance i 


[ subscript 


7 See K. A. Brownlee, Statistical Theory and Methodology in Science and Engineering , John Wiley & Sons, 
New York, 1960, pp. 278-280. 
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TABLE 8.3 
ANOVA Table for the 
Child Mortality 
Example 


Decision Rule 


Source of Variation 

SS 

df 

MSS 

Due to regression 

257,362.4 

2 

128,681.2 

Due to residuals 

106,315.6 

61 

1742.88 

Total 

363,678 

63 



Using Eq. (8.4.3), we obtain 


128,681.2 

1742.88 


73.8325 


(8.4.6) 


The p value of obtaining an lvalue of as much as 73.8325 or greater is almost zero, leading 
to the rejection of the hypothesis that together PGNP and FLR have no effect on child mor¬ 
tality. If you were to use the conventional 5 percent level-of-significance value, the critical F 
value for 2 df in the numerator and 60 df in the denominator (the actual df, however, are 61) 
is about 3.15, or about 4.98 if you were to use the 1 percent level of significance. Obviously, 
the observed F of about 74 far exceeds any of these critical F values. 

We can generalize the preceding F-testing procedure as follows. 


Testing the Overall Significance of a Multiple 
Regression: The FTest 


Given the /t-variable regression model: 

Y, = fit + P2X21 + fiiXii -\ -1- fikXki + Ui 


To test the hypothesis 


Ho- fi2 = Pi = ■ • • = fik = 0 


(i.e. ; all slope coefficients are simultaneously zero) versus 

: Not all slope coefficients are simultaneously zero 


compute 


ESS/df ESS/(/c — 1) 
RSS/df ~ RSS/(r> - k) 


(8.4.7) 


If F > F a (k — 1 , n — k), reject H 0 ; otherwise you do not reject it, where F a (k — 1 , n — k) 
is the critical F value at the a level of significance and (/c — 1 ) numerator df and (n — k) de¬ 
nominator df. Alternatively, if the p value of F obtained from Eq. (8.4.7) is sufficiently low, 
one can reject Hq. 


Needless to say, in the three-variable case ( Y and X 2 , X 3 ) k is 3, in the four-variable case 
k is 4, and so on. 

In passing, note that most regression packages routinely calculate the F value (given in 
the analysis of variance table) along with the usual regression output, such as the estimated 
coefficients, their standard errors, t values, etc. The null hypothesis for the t computation is 
usually assumed to be fit = 0. 
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Individual versus Joint Testing of Hypotheses 

In Section 8.3 we discussed the test of significance of a single regression coefficient and in 
Section 8.4 we have discussed the joint or overall test of significance of the estimated re¬ 
gression (i.e., all slope coefficients are simultaneously equal to zero). We reiterate that 
these tests are different. Thus, on the basis of the t test or confidence interval (of Sec¬ 
tion 8.3) it is possible to accept the hypothesis that a particular slope coefficient, fi k , is zero, 
and yet reject the joint hypothesis that all slope coefficients are zero. 

The lesson to be learned is that the joint “message” of individual confidence intervals is no 
substitute for a joint confidence region [implied by the F test] in performing joint tests of 
hypotheses and making joint confidence statements. 8 


An Important Relationship between R 2 and F 

There is an intimate relationship between the coefficient of determination R 2 and the F test 
used in the analysis of variance. Assuming the normal distribution for the disturbances m, 
and the null hypothesis that fo = fa = 0, we have seen that 


F _ ESS/2 
RSS/(« - 3) 


(8.4.8) 


is distributed as the F distribution with 2 and n — 3 df. 

More generally, in the &-variable case (including intercept), if we assume that the distur¬ 
bances are normally distributed and that the null hypothesis is 

H 0 :p 2 = fo = --- = Pk = 0 (8.4.9) 

then it follows that 


ESS /{k - 1) 
RSS/(h - k) 


(8.4.7) = (8.4.10) 


follows the F distribution with k — 1 and n — k df. {Note: The total number of parameters 
to be estimated is k, of which 1 is the intercept term.) 

Let us manipulate Eq. (8.4.10) as follows: 


n — k ESS 
k- 1 RSS 
n-k ESS 
E 1 TSS - ESS 

n-k ESS/TSS 
k- 11 — (ESS/TSS) 
n-k R 2 

E l i - r 2 

R 2 /(k- 1) 

(1 -R 2 )/{n-k) 


(8.4.11) 


8 Fomby et al., op. cit., p. 42. 
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TABLE 8.4 

ANOVA Table in 
Terms of R 2 


Decision Rule 


Source of Variation 

SS 

df 

MSS* 

Due to regression 

* 2 (I>, 2 ) 

2 

« 2 (Ey, ? )/2 

Due to residuals 

0 -k 2 )G>, ? ) 

n- 3 

0 -« 2 )(E7 2 )/(«-3) 

Total 

Hyf 

n-t 



*Note that in computing the F value there is no need to multiply R 2 and (1 — R 2 ) by because it drops out, as shown in 
Eq. (8.4.12). 


where use is made of the definition R 2 — ESS/TSS. Equation (8.4.11) shows how F and R 2 
are related. These two vary directly. When R 2 —0,F is zero ipso facto. The larger the R 2 , 
the greater the F value. In the limit, when R 2 — 1, F is infinite. Thus the F test, which is a 
measure of the overall significance of the estimated regression, is also a test of significance 
ofR 2 . In other words, testing the null hypothesis in Eq. (8.4.9) is equivalent to testing the 
null hypothesis that (the population) R 2 is zero. 

For the three-variable case, Eq. (8.4.11) becomes 


R 2 / 2 

(l — R 2 )/( n — 3 ) 


(8.4.12) 


By virtue of the close connection between F and R 2 , the ANOVA Table (Table 8.1) can be 
recast as Table 8.4. 

For our illustrative example, using Eq. (8.4.12) we obtain: 


0.7077/2 
(1 - 0.7077)/61 


= 73.8726 


which is about the same as obtained before, except for the rounding errors. 

One advantage of the F test expressed in terms of R 2 is its ease of computation: All that 
one needs to know is the R 2 value. Therefore, the overall F test of significance given in 
Eq. (8.4.7) can be recast in terms of R 2 as shown in Table 8.4. 


Testing the Overall Significance of a Multiple 
Regression in Terms of R 2 


Testing the overall significance of a regression in terms of R 2 : Alternative but equivalent 
test to Eq. (8.4.7). 

Given the k-variable regression model: 

Yj = % + PiXii +/83X3/-I- 4 * fix X ki + Ui 

To test the hypothesis 

Ho- P2 = pz = ■ ■ ■ = Pk = 0 

versus 


compute 


Hi: Not all slope coefficients are simultaneously zero 


R 2 /(k~ 1) 

0 - R 2 )/(n-k) 


(8.4.13) 


If f > Futk—t,n—k) 1 reject H 0 ; otherwise you may accept H 0 where F a (k-i,n-k) is the critical 
F value at the a level of significance and (k - 1) numerator df and (n - k ) denominator df. 
Alternatively, if the p value of F obtained from Eq. (8.4.13) is sufficiently low, reject H 0 . 
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Before moving on, return to Example 7.5 in Chapter 7. From regression (7.10.7) we 
observe that RGDP (relative per capita GDP) and RGDP squared explain only about 
10.92 percent of the variation in GDPG (GDP growth rate) in a sample of 190 countries. 
This R 2 of 0.1092 seems a “low” value. Is it really statistically different from zero? How do 
we find that out? 

Recall our earlier discussion in “An Important Relationship between R 2 and F” about 
the relationship between R 2 and the F value as given in Eq. (8.4.11) or Eq. (8.4.12) for the 
specific case of two regressors. As noted, if R 2 is zero, then F is zero ipso facto, which will 
be the case if the regressors have no impact whatsoever on the regressand. Therefore, if we 
insert R 2 = 0.1092 into formula (8.4.12), we obtain 


0.1092/2 

~~ (1 — 0.1092)/187 


= 11.4618 


(8.4.13) 


Under the null hypothesis that R 2 — 0, the preceding F value follows the F distribution with 
2 and 187 df in the numerator, respectively. {Note: There are 190 observations and two re¬ 
gressors.) From the Stable we see that this F value is significant at about the 5 percent level; 
the p value is actually 0.00002. Therefore, we can reject the null hypothesis that the two re¬ 
gressors have no impact on the regressand, notwithstanding the fact that the R 2 is only 0.1092. 

This example brings out an important empirical observation that in cross-sectional data 
involving several observations, one generally obtains low R 2 because of the diversity of the 
cross-sectional units. Therefore, one should not be surprised or worried about finding low 
R 2, s in cross-sectional regressions. What is relevant is that the model is correctly specified, 
that the regressors have the correct (i.e., theoretically expected) signs, and that (hopefully) 
the regression coefficients are statistically significant. The reader should check that individ¬ 
ually both of the regressors in Eq. (7.10.7) are statistically significant at the 5 percent or 
better level (i.e., lower than 5 percent). 


The "Incremental" or "Marginal" Contribution 
of an Explanatory Variable 

In Chapter 7 we stated that generally we cannot allocate the R 2 value among the various re¬ 
gressors. In our child mortality example we found that the R 2 was 0.7077 but we cannot say 
what part of this value is due to the regressor PGNP and what part is due to female literacy 
rate (FLR) because of possible correlation between the two regressors in the sample at 
hand. We can shed more light on this using the analysis of variance technique. 

For our illustrative example we found that individually X2 (PGNP) and A3 (FLR) were 
statistically significant on the basis of ( separate ) t tests. We have also found that on the 
basis of the F test collectively both the regressors have a significant effect on the regressand 
Y (child mortality). 

Now suppose we introduce PGNP and FLR sequentially; that is, we first regress child 
mortality on PGNP and assess its significance and then add FLR to the model to find out 
whether it contributes anything (of course, the order in which PGNP and FLR enter can be re¬ 
versed). By contribution we mean whether the addition of the variable to the model increases 
ESS (and hence R 2 ) “significantly” in relation to the RSS. This contribution may appropri¬ 
ately be called the incremental, or marginal, contribution of an explanatory variable. 

The topic of incremental contribution is an important one in practice. In most empirical 
investigations the researcher may not be completely sure whether it is worth adding an X 
variable to the model knowing that several other X variables are already present in the 
model. One does not wish to include a variable(s) that contributes very little toward ESS. 
By the same token, one does not want to exclude a variable(s) that substantially increases 
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TABLE 8.5 

ANOVA Table for 
Regression 
Equation (8.4.14) 


Source of Variation 

SS 

df 

MSS 

ESS (due to PGNP) 

60,449.5 

1 

60,449.5 

RSS 

303,228.5 

62 

4890.7822 

Total 

363,678 

63 



ESS. But how does one decide whether an ^variable significantly reduces RSS? The analy¬ 
sis of variance technique can be easily extended to answer this question. 

Suppose we first regress child mortality on PGNP and obtain the following regression: 


CM, = 157.4244 - 
t= (15.9894) 
p value = (0.0000) 


0.0114 PGNP 
(-3.5156) 
(0.0008) 


(8.4.14) 

r 2 = 0.1662 
adjr 2 = 0.1528 


As these results show, PGNP has a significant effect on CM. The ANOVA table corre¬ 
sponding to the preceding regression is given in Table 8.5. 

Assuming the disturbances w, are normally distributed and the hypothesis that PGNP 
has no effect on CM, we obtain the F value of 


60,449.5 

4890.7822 


= 12.3598 


(8.4.15) 


which follows the F distribution with 1 and 62 df. This F value is highly significant, as the 
computed p value is 0.0008. Thus, as before, we reject the hypothesis that PGNP has no 
effect on CM. Incidentally, note that t 2 = (—3.5156) 2 = 12.3594, which is approximately 
the same as the F value of Eq. (8.4.15), where the t value is obtained from Eq. (8.4.14). But 
this should not be surprising in view of the fact that the square of the t statistic with n df is 
equal to the F value with 1 df in the numerator and n df in the denominator, a relationship first 
established in Chapter 5. Note that in the present example, n — 64. 

Having ran the regression (8.4.14), let us suppose we decide to add FLR to the model 
and obtain the multiple regression (8.1.4). The questions we want to answer are: 

1. What is the marginal, or incremental, contribution of FLR, knowing that PGNP is 
already in the model and that it is significantly related to CM? 

2. Is the incremental contribution of FLR statistically significant? 

3. What is the criterion for adding variables to the model? 

The preceding questions can be answered by the ANOVA technique. To see this, let us con¬ 
struct Table 8.6. In this table X 2 refers to PGNP and V 3 refers to FLR. 

To assess the incremental contribution of A) after allowing for the contribution of X 2 , we 
form 


02 /df 

04/df 

(ESS new — ESS 0 i(i)/number of new regressors 
RSS new /df (= n — number of parameters in the new model) 


02/1 

04/61 


for our example 


(8.4.16) 
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TABLE 8.6 

ANOVA Table to 

Source of Variation 

SS df 

MSS 

Assess Incremental 

Contribution of a 

ESS due to X 2 alone 

Qi = #12 E *2 1 

Qi 

1 

Variable(s) 

ESS due to the addition of X 3 

Q 2 = Qs - Qi 1 

Qz 

1 


ESS due to both X 2 , X 3 

Q 3 = I 2 £ yi*2i + &£ yi*ii 2 

Qs 

2 


RSS 

Qa = Qs - Q 3 n - 3 

q 4 


Total 

Qs -T,yf n- 1 

n - 3 


where ESS new = ESS under the new model (i.e., after adding the new regressors = 0 3 ), 
ESS 0 i d = ESS under the old model ( = Q i), and RSS new = RSS under the new model (i.e., 
after taking into account all the regressors = g 4 ). For our illustrative example the results 
are as shown in Table 8.7. 

Now applying Eq. (8.4.16), we obtain: 


196,912.9 

1742.8786 


112.9814 


(8.4.17) 


Under the usual assumptions, this F value follows the F distribution with 1 and 62 df. The 
reader should check that this F value is highly significant, suggesting that the addition of 
FLR to the model significantly increases ESS and hence the R 2 value. Therefore, FLR 
should be added to the model. Again, note that if you square the /-statistic value of the FLR 
coefficient in the multiple regression (8.1.4), which is (—10.6293) 2 , you will obtain the F 
value of Eq. (8.4.17), save for the rounding errors. 

Incidentally, the F ratio of Eq. (8.4.16) can be recast by using the R 2 values only, as we 
did in Eq. (8.4.13). As Exercise 8.2 shows, the F ratio of Eq. (8.4.16) is equivalent to the 
following F ratio: 9 


(R 2 ncw - R 2 0li )/df 
(1 -R 2 new )/df 

(i?^ ew — Rg ld ) /number of new regressors 
(l - R 2 ncn )/df (= n - number of parameters in the new model) 

(8.4.18) 


TABLE 8.7 
ANOVA Table for the 
Illustrative Example: 
Incremental Analysis 


9 The following Ftest is a special case of the more general Ftest given in Eq. (8.6.9) or Eq. (8.6.10) in 
Section 8.6. 


Source of Variation 

SS 

df 

MSS 

ESS due to PGNP 

60,449.5 

1 

60,449.5 

ESS due to the addition of FLR 

196,912.9 

1 

196,912.9 

ESS due to PGNP and FLR 

257,362.4 

2 

128,681.2 

RSS 

106,315.6 

61 

1742.8786 

Total 

363,678 

63 
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This F ratio follows the F distribution with the appropriate numerator and denominator df, 
1 and 61 in our illustrative example. 

For our example, R 2 Bew = 0.7077 (from Eq. [8.1.4]) and = 0.1662 (from 
Eq. [8.4.14]). Therefore, 


(0.7077 -0.1662)/1 
(1 -0.7077)/61 


= 113.05 


(8.4.19) 


which is about the same as that obtained from Eq. (8.4.17), except for the rounding errors. 
This F is highly significant, reinforcing our earlier finding that the variable FLR belongs in 
the model. 

A cautionary note: If you use the R 2 version of the F test given in Eq. (8.4.11), make 
sure that the dependent variable in the new and the old models is the same. If they are dif¬ 
ferent, use the F test given in Eq. (8.4.16). 


When to Add a New Variable 

The F-test procedure just outlined provides a formal method of deciding whether a variable 
should be added to a regression model. Often researchers are faced with the task of choos¬ 
ing from several competing models involving the same dependent variable but with dif¬ 
ferent explanatory variables. As a matter of ad hoc choice (because very often the theoretical 
foundation of the analysis is weak), these researchers frequently choose the model that gives 
the highest adjusted R 2 . Therefore, if the inclusion of a variable increases R 2 , it is retained 
in the model although it does not reduce RSS significantly in the statistical sense. The ques¬ 
tion then becomes: When does the adjusted R 2 increase? It can be shown that R 2 will in¬ 
crease if the t value of the coefficient of the newly added variable is larger than 1 in absolute 
value, where the t value is computed under the hypothesis that the population value of the 
said coefficient is zero (i.e., the t value computed from Eq. [5.3.2] under the hypothesis that 
the true ft value is zero). 10 The preceding criterion can also be stated differently: R 2 will in¬ 
crease with the addition of an extra explanatory variable only if the F( = t 2 ) value of that 
variable exceeds 1. 

Applying either criterion, the FLR variable in our child mortality example with a t value 
of —10.6293 or an F value of 112.9814 should increase R 2 , which indeed it does—when 
FLR is added to the model, R 2 increases from 0.1528 to 0.6981. 


When to Add a Group of Variables 

Can we develop a similar rule for deciding whether it is worth adding (or dropping) a group 
of variables from a model? The answer should be apparent from Eq. (8.4.18): If adding 
(dropping) a group of variables to the model gives an F value greater (less) than 1, R 2 will 
increase (decrease). Of course, from Eq. (8.4.18) one can easily find out whether the addi¬ 
tion (subtraction) of a group of variables significantly increases (decreases) the explanatory 
power of a regression model. 


8.5 Testing the Equality of Two Regression Coefficients 

Suppose in the multiple regression 

Yt=P i + ft 2 X 2i + p 3 X 3i + p 4 X 4i + Ui (8.5.1) 


10 For proof, see Dennis ]. Aigner, Basic Econometrics, Prentice Hall, Englewood Cliffs, N], 1971, 
pp. 91-92. 
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we want to test the hypotheses 

Ho: ft = ft 
/ft ft ^ft 


or (ft - ft) = 0 
or (ft - ft) ^ 0 


(8.5.2) 


that is, the two slope coefficients ft and ft are equal. 

Such a null hypothesis is of practical importance. For example, let Eq. (8.5.1) represent 
the demand function for a commodity where Y = amount of a commodity demanded, Xi — 
price of the commodity, X3 = income of the consumer, and X4 = wealth of the consumer. 
The null hypothesis in this case means that the income and wealth coefficients are the same. 
Or, if Yi and theX’s are expressed in logarithmic form, the null hypothesis in Eq. (8.5.2) im¬ 
plies that the income and wealth elasticities of consumption are the same. (Why?) 

How do we test such a null hypothesis? Under the classical assumptions, it can be shown 
that 


se (ft - ft) 


(8.5.3) 


follows the t distribution with (n — 4) df because Eq. (8.5.1) is a four-variable model or, 
more generally, with (n — k ) df, where k is the total number of parameters estimated, 
including the constant term. The se (ft — ft) is obtained from the following well-known 
formula (see Appendix A for details): 


se(ft - 


ft) = / 


var(ft) + var(ft) - 2 cov(ft, ft) 


(8.5.4) 


If we substitute the null hypothesis and the expression for the se (ft — ft) into 
Eq. (8.5.3), our test statistic becomes 


ft ~ ft 


7 var (ft) + var (ft) - 2 cov (ft, ft) 


(8.5.5) 


Now the testing procedure involves the following steps: 

1. Estimate ft and ft. Any standard computer package can do that. 

2. Most standard computer packages routinely compute the variances and covariances of 
the estimated parameters. 11 From these estimates the standard error in the denominator 
of Eq. (8.5.5) can be easily obtained. 

3. Obtain the t ratio from Eq. (8.5.5). Note the null hypothesis in the present case is 

(ft - ft) = 0. 

4. If the t variable computed from Eq. (8.5.5) exceeds the critical t value at the designated 
level of significance for given df, then you can reject the null hypothesis; otherwise, you 
do not reject it. Alternatively, if the p value of the t statistic from Eq. (8.5.5) is reason¬ 
ably low, one can reject the null hypothesis. Note that the lower the p value, the greater 
the evidence against the null hypothesis. Therefore, when we say that a p value is low or 
reasonably low, we mean that it is less than the significance level, such as 10, 5, or 1 per¬ 
cent. Some personal judgment is involved in this decision. 


"The algebraic expression for the covariance formula is rather involved. Appendix C provides 
compact expression for it, however, using matrix notation. 
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EXAMPLE 8.2 

The Cubic Cost 

Function 

Revisited 


Recall the cubic total cost function estimated in Example 7.4, Section 7.10, which for con¬ 
venience is reproduced below: 

t, = 141.7667 + 63.4777X, - 12.9615 Xf + 0.9396X, 3 

se = (6.3753) (4.7786) (0.9857) (0.0591) (7.10.6) 

cov(/3 3 ,/§4) =-0.0576; R 2 = 0.9983 


where Y is total cost and X is output, and where the figures in parentheses are the esti¬ 
mated standard errors. 

Suppose we want to test the hypothesis that the coefficients of the X 2 and X 3 terms in 
the cubic cost function are the same, that is, £3 = fa or (fa — fa) = 0. In the regression 
(7.10.6) we have all the necessary output to conduct the t test of Eq. (8.5.5). The actual 
mechanics are as follows: 


ir(fa) + var(fa)-2 cov (fa, fa) 


-12.9615-0.9396 

’ 7(0.9867)2 + (0.0591 ) 2 - 2(-0.0576) 

- 139011 -13.3130 


1.0442 

The reader can verify that for 6 df (why?) the observed t value exceeds the critical t value 
even at the 0.002 (or 0.2 percent) level of significance (two-tail test); the p value is ex¬ 
tremely small, 0.000006. Hence we can reject the hypothesis that the coefficients of X 2 
and X 3 in the cubic cost function are identical. 


8.6 Restricted Least Squares: Testing Linear Equality Restrictions 

There are occasions where economic theory may suggest that the coefficients in a regression 
model satisfy some linear equality restrictions. For instance, consider the Cobb-Douglas 
production function: 

Yj = faX^X^e u ‘ (7.9.1) = (8.6.1) 

where Y — output, X 2 = labor input, and X 3 = capital input. Written in log form, the equa¬ 
tion becomes 


In Yi=p 0 + fa In X 2i + ft In X 3i + u, (8.6.2) 

where fa = In ft . 

Now if there are constant returns to scale (equiproportional change in output for an 
equiproportional change in the inputs), economic theory would suggest that 

ft + ft = l (8.6.3) 

which is an example of a linear equality restriction. 12 

How does one find out if there are constant returns to scale, that is, if the restriction 
(8.6.3) is valid? There are two approaches. 


12 lf we had fa + fa < 1, this relation would be an example of a linear inequality restriction. To handle 
such restrictions, one needs to use mathematical programming techniques. 
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The t-Test Approach 

The simplest procedure is to estimate Eq. (8.6.2) in the usual manner without taking into 
account the restriction (8.6.3) explicitly. This is called the unrestricted or unconstrained 
regression. Having estimated ft and ft (say, by the OLS method), a test of the hypothesis 
or restriction (8.6.3) can be conducted by the t test of Eq. (8.5.3), namely, 

f = (ft + ft) ~ (ft + ft) 

se (ft + ft) 

= _ (ft + ft) ~ 1 _ (8- 6 - 4 ) 

sj var (ft) + var(ft) + 2cov(ft, ft) 

where (ft + ft) = 1 under the null hypothesis and where the denominator is the standard 
error of (ft + ft). Then following Section 8.5, if the t value computed from Eq. (8.6.4) ex¬ 
ceeds the critical t value at the chosen level of significance, we reject the hypothesis of con¬ 
stant returns to scale; otherwise we do not reject it. 

The F-Test Approach: Restricted Least Squares 

The preceding t test is a kind of postmortem examination because we try to find out whether 
the linear restriction is satisfied after estimating the “unrestricted” regression. A direct ap¬ 
proach would be to incorporate the restriction (8.6.3) into the estimating procedure at the 
outset. In the present example, this procedure can be done easily. From (8.6.3) we see that 

ft = 1 - ft (8.6.5) 

or 

ft = 1 - ft (8.6.6) 

Therefore, using either of these equalities, we can eliminate one of the fi coefficients in 
Eq. (8.6.2) and estimate the resulting equation. Thus, if we use Eq. (8.6.5), we can write the 
Cobb-Douglas production function as 

In Y,:= ft + (1 - ft) In X 2l + ft In X 3i + u, 

= ft + inX 2i + ft (In X 3i - - \nX 2i ) + u t 


or 

(In Y t - In X 2i ) = ft + ft (In 2ft - In X 2i ) + Ui (8.6.7) 

or 

In W/Xu) = ft + ft \n(X 3i /X 2i ) + u, (8.6.8) 

where (Y, /X 2i ) — output/labor ratio and (X 3i /X 2i ) = capital labor ratio, quantities of great 
economic importance. 

Notice how the original equation (8.6.2) is transformed. Once we estimate ft from 
Eq. (8.6.7) or Eq. (8.6.8), ft can be easily estimated from the relation (8.6.5). Needless to 
say, this procedure will guarantee that the sum of the estimated coefficients of the two inputs 
will equal 1. The procedure outlined in Eq. (8.6.7) or Eq. (8.6.8) is known as restricted 
least squares (RLS). This procedure can be generalized to models containing any number 
of explanatory variables and more than one linear equality restriction. The generalization 
can be found in Theil. 13 (See also general F testing below.) 


13 Henri Theil, Principles of Econometrics, John Wiley & Sons, New York, 1971, pp. 43-45. 
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How do we compare the unrestricted and restricted least-squares regressions? In other 
words, how do we know that, say, the restriction (8.6.3) is valid? This question can be an¬ 
swered hy applying the F test as follows. Let 

Y, = RSS of the unrestricted regression (8.6.2) 

Y,Ur = RSS of the restricted regression (8.6.7) 

m — number of linear restrictions (1 in the present example) 
k — number of parameters in the unrestricted regression 
n — number of observations 
Then, 


(RSS r - RSS UR )/m 
RSS UR /(n - k ) 


Yu 2 UR /(n-k) 


(8.6.9) 


follows the F distribution with m, (n — k) df. {Note: UR and R stand for unrestricted and 
restricted, respectively.) 

The F test above can also be expressed in terms of R 2 as follows: 


F - 

R^)/(n-k) 


( 8 . 6 . 10 ) 


where R^ r and R r are, respectively, the R 2 values obtained from the unrestricted and 
restricted regressions, that is, from the regressions (8.6.2) and (8.6.7). It should be noted that 

*ur£ 4 (8.6.11) 

and 

Y.%* — TA ( 8 . 6 . 12 ) 

In Exercise 8.4 you are asked to justify these statements. 

A cautionary note: In using Eq. (8.6.10) keep in mind that if the dependent variable in 
the restricted and unrestricted models is not the same, and R\ are not directly compa¬ 
rable. In that case, use the procedure described in Chapter 7 to render the two R 2 values 
comparable (see Example 8.3 below) or use the Ftest given in Eq. (8.6.9). 


EXAMPLE 8.3 

The Cobb- 

Douglas 

Production 

Function for the 

Mexican 

Economy, 

1955-1974 


By way of illustrating the preceding discussion, consider the data given in Table 8.8. 
Attempting to fit the Cobb-Douglas production function to these data yielded the fol¬ 
lowing results: 

l?TGDP t = -1.6524 + 0.3397 In Labor f + 0.8460 In Capital (8.6.13) 

t= (-2.7259) (1.8295) (9.0625) 

p value = (0.0144) (0.0849) (0.0000) 

R 2 = 0.9951 RSS ur = 0.0136 
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TABLE 8.8 

Real GDP, 

Year 

GDP* 

Employment 

Fixed Capital* 

Employment, and 

1955 

114043 

8310 

182113 

Real Fixed 

1956 

120410 

8529 

193749 


1957 

129187 

8738 

205192 


1958 

134705 

8952 

215130 

Source: Victor J. Elias, 

1959 

139960 

9171 

225021 


1960 

150511 

9569 

237026 

Economies, International 

1961 

157897 

9527 

248897 

ICSPre^ s^ n F ° r ^ifco Wth ’ 

1962 

165286 

9662 

260661 

1992. Data from Tables E5, 

1963 

178491 

10334 

275466 

E12, and E14. 

1964 

199457 

10981 

295378 


1965 

212323 

11746 

315715 


1966 

226977 

11521 

337642 


1967 

241194 

11540 

363599 


1968 

260881 

12066 

391847 


1969 

277498 

12297 

422382 


1970 

296530 

12955 

455049 


1971 

306712 

13338 

484677 


1972 

329030 

13738 

520553 


1973 

354057 

15924 

561531 


1974 

374977 

14154 

609825 


♦Millions 

of I960 pesos. 




^Thousands of people. 
tMiUions of 1960 pesos 


where RSSur is the unrestricted RSS, as we have put no restrictions on estimating 
Eq. (8.6.1 B). 

We have already seen in Chapter 7 how to interpret the coefficients of the Cobb- 
Douglas production function. As you can see, the output/labor elasticity is about 0.B4 
and the output/capital elasticity is about 0.85. If we add these coefficients, we obtain 
1.19, suggesting that perhaps the Mexican economy during the stated time period was 
experiencing increasing returns to scale. Of course, we do not know if 1.19 is statisti¬ 
cally different from 1. 

To see if that is the case, let us impose the restriction of constant returns to scale, 
which gives the following regression: 


In (GDP/Labor) t = -0.4947 + 1.0153 In (Capital/Labor)t (8.6.14) 

t= (-4.0612) (28.1056) 
p value = (0.0007) (0.0000) 

Rr = 0.9777 RSSr = 0.0166 


where RSSr is the restricted RSS, for we have imposed the restriction that there are con¬ 
stant returns to scale. 


( Continued ) 
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EXAMPLE 8.3 

0 Continued) 


Since the dependent variable in the preceding two regressions is different, we have to 
use the Ftest given in Eq. (8.6.9). We have the necessary data to obtain the F value. 

(RSSr - RSSur )/m 
RSS UR /(n — k) 

(0.0166 — 0.0136)/1 
(0.0136)/(20 - 3) 

= 3.75 

Note in the present case m = 1, as we have imposed only one restriction and (rr — k) is 17, 
since we have 20 observations and three parameters in the unrestricted regression. 

This F value follows the F distribution with 1 df in the numerator and 17 df in the 
denominator. The reader can easily check that this F value is not significant at the 5% level. 
(See Appendix D, Table D.3.) 

The conclusion then is that the Mexican economy was probably characterized by con¬ 
stant returns to scale over the sample period and therefore there may be no harm in using 
the restricted regression given in Eq. (8.6.14). As this regression shows, if capital/labor 
ratio increased by 1 percent, on average, labor productivity went up by about 1 percent. 


General F Testing 14 

The F test given in Eq. (8.6.10) or its equivalent in Eq. (8.6.9) provides a general method 
of testing hypotheses about one or more parameters of the variable regression model: 

Y, = Pi + p 2 X 2l + fhX 3i + • • • + p k X ki + Ui (8.6.15) 

The F test of Eq. (8.4.16) or the t test of Eq. (8.5.3) is but a specific application of 
Eq. (8.6.10). Thus, hypotheses such as 

H 0 : p 2 = fr (8.6.16) 

H 0 :p 3 +p 4 + Ps = 3 (8.6.17) 

which involve some linear restrictions on the parameters of the ^-variable model, or 
hypotheses such as 


Ho - Ps = P<\ = Ps = As = 0 


(8.6.18) 


which imply that some regressors are absent from the model, can all be tested by the F test 
of Eq. (8.6.10). 

From the discussion in Sections 8.4 and 8.6, the reader will have noticed that the general 
strategy of F testing is this: There is a larger model, the unconstrained model (8.6.15), and 
then there is a smaller model, the constrained or restricted model, which is obtained from 
the larger model by deleting some variables from it, e.g., Eq. (8.6.18), or by putting some 
linear restrictions on one or more coefficients of the larger model, e.g., Eq. (8.6.16) or 
Eq. (8.6.17). 

14 lf one is using the maximum likelihood approach to estimation, then a test similar to the one dis¬ 
cussed shortly is the likelihood ratio test, which is slightly involved and is therefore discussed in 
the appendix to the chapter. For further discussion, see Theil, op. cit., pp. 1 79-184. 
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EXAMPLE 8.4 

The Demand for 
Chicken in the 
United States, 
1960-1982 


We then fit the unconstrained and constrained models to the data and obtain the respec¬ 
tive coefficients of determination, namely, R^ r and R r . We note the df in the unconstrained 
model ( — n — k) and also note the df in the constrained model {= m),m being the num¬ 
ber of linear restriction (e.g., 1 in Eq. [8.6.16] or Eq. [8.6.18]) or the number of regressors 
omitted from the model (e.g., m — 4 if Eq. [8.6.18] holds, since four regressors are as¬ 
sumed to be absent from the model). We then compute the F ratio as indicated in Eq. (8.6.9) 
or Eq. (8.6.10) and use this Decision Rule: If the computed F exceeds F a (m, n — k), where 
F a (m, n — k) is the critical F at the a level of significance, we reject the null hypothesis: 
otherwise we do not reject it. 

Let us illustrate: 


In Exercise 7.19, among other things, you were asked to consider the following demand 
function for chicken: 

In Y t =fr+ P 2 In X 2 t + Pi In Xu + Pa In X 4 f + Ps In Xst + Ui (8.6.19) 
where Y = per capita consumption of chicken, lb, X 2 = real disposable per capita income, 
$, X 3 = real retail price of chicken per lb, <t, X 4 = real retail price of pork per lb, d, and X 5 = 
real retail price of beef per lb, <t. 

In this model p 2 , Pi, Pa, and p s are, respectively, the income, own-price, cross-price 
(pork), and cross-price (beef) elasticities. (Why?) According to economic theory, 

02 >0 
Pi <0 

Pa > 0, if chicken and pork are competing products 

< 0 , if chicken and pork are complementary products (8.6.20) 

= 0 , if chicken and pork are unrelated products 

Ps > 0, if chicken and beef are competing products 

< 0 , if chicken and pork are complementary products 

= 0 , if chicken and pork are unrelated products 

Suppose someone maintains that chicken and pork and beef are unrelated products in 
the sense that chicken consumption is not affected by the prices of pork and beef. In short, 
H 0 :p 4 = Ps = 0 (8.6.21) 

Therefore, the constrained regression becomes 

In Y t = Pi + p 2 In X 2 t + Pi In X 3t + u t ( 8 . 6 . 22 ) 

Equation (8.6.19) is of course the unconstrained regression. 

Using the data given in Exercise 7.19, we obtain the following: 

Unconstrained regression: 

lnV f = 2.1898 + 0.3425 In X 2t - 0.5046 In X 3t + 0.1485 In X 4t + 0.0911 In X 5t 
(0.1557) (0.0833) (0.1109) (0.0997) (0.1007) 

/?ur = 0.9823 (8.6.23) 

Constrained regression: 

lnV t = 2.0328 + 0.4515lnX 2t - 0.3772 In X 3t 

(0.1162) (0.0247) (0.0635) (8.6.24) 

Rr = 0.9801 


( Continued) 
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EXAMPLE 8.4 

( Continued) 


where the figures in parentheses are the estimated standard errors. Note: The R 2 values of 
Eqs. (8.6.23) and (8.6.24) are comparable since the dependent variable in the two mod¬ 
els is the same. 

Now the F ratio to test the hypothesis of Eq. (8.6.21) is 


0 -«Sr) /("-*) 


( 8 . 6 . 10 ) 


The value of m in the present case is 2, since there are two restrictions involved: P4 = 0 and 
Ps = 0. The denominator df, (n — k ), is 18, since n = 23 and k = 5 (5 p coefficients). 
Therefore, the f ratio is 


(0.9823-0.9801 )/2 

(1 -0.9823)/18 (8.6.25) 

= 1.1224 

which has the F distribution with 2 and 18 df. 

At 5 percent, clearly this F value is not statistically significant [F 0 .s(2,18) = 3.55]. The 
p value is 0.3472. Therefore, there is no reason to reject the null hypothesis—the demand 
for chicken does not depend on pork and beef prices. In short, we can accept the con¬ 
strained regression (8.6.24) as representing the demand function for chicken. 

Notice that the demand function satisfies a priori economic expectations in that the 
own-price elasticity is negative and that the income elasticity is positive. However, the es¬ 
timated price elasticity, in absolute value, is statistically less than unity, implying that the 
demand for chicken is price inelastic. (Why?) Also, the income elasticity, although positive, 
is also statistically less than unity, suggesting that chicken is not a luxury item; by conven¬ 
tion, an item is said to be a luxury item if its income elasticity is greater than 1. 


8.7 Testing for Structural or Parameter Stability of Regression 
Models: The Chow Test 


When we use a regression model involving time series data, it may happen that there is a 
structural change in the relationship between the regressand Y and the regressors. By 
structural change, we mean that the values of the parameters of the model do not remain the 
same through the entire time period. Sometimes the structural change may be due to exter¬ 
nal forces (e.g., the oil embargoes imposed by the OPEC oil cartel in 1973 and 1979 or the 
Gulf War of 1990-1991), policy changes (such as the switch from a fixed exchange-rate 
system to a flexible exchange-rate system around 1973), actions taken by Congress (e.g., 
the tax changes initiated by President Reagan in his two terms in office or changes in the 
minimum wage rate), or a variety of other causes. 

How do we find out that a structural change has in fact occurred? To be specific, con¬ 
sider the data given in Table 8.9. This table gives data on disposable personal income and 
personal savings, in billions of dollars, for the United States for the period 1970-1995. 
Suppose we want to estimate a simple savings function that relates savings (Y) to dispos¬ 
able personal income DPI (X). Since we have the data, we can obtain an OLS regression of 
Y on X. But if we do that, we are maintaining that the relationship between savings and DPI 
has not changed much over the span of 26 years. That may be a tall assumption. For exam¬ 
ple, it is well known that in 1982 the United States suffered its worst peacetime recession. 
The civilian unemployment rate that year reached 9.7 percent, the highest since 1948. An 
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TABLE 8.9 

Savings and Personal 

Observation 

Savings 

Income 

Observation 

Savings 

Income 

1970 

61.0 

727.1 

1983 

167.0 

2522.4 

(billions of dollars), 

1971 

68.6 

790.2 

1984 

235.7 

2810.0 

United States, 

1972 

63.6 

855.3 

1985 

206.2 

3002.0 

1970-1995 

1973 

89.6 

965.0 

1986 

196.5 

3187.6 


1974 

97.6 

1054.2 

1987 

168.4 

3363.1 

Source: Economic Report 

1975 

104.4 

1159.2 

1988 

189.1 

3640.8 

of the President, 1997, 

Table B-28, p. 332. 

1976 

96.4 

1273.0 

1989 

187.8 

3894.5 

1977 

92.5 

1401.4 

1990 

208.7 

4166.8 


1978 

112.6 

1580.1 

1991 

246.4 

4343.7 


1979 

130.1 

1 769.5 

1992 

272.6 

4613.7 


1980 

161.8 

1973.3 

1993 

214.4 

4790.2 


1981 

199.1 

2200.2 

1994 

189.4 

5021.7 


1982 

205.5 

2347.3 

1995 

249.3 

5320.8 


event such as this might disturb the relationship between savings and DPI. To see if this 
happened, let us divide our sample data into two time periods: 1970-1981 and 1982-1995, 
the pre- and post-1982 recession periods. 

Now we have three possible regressions: 

Time period 1970-1981: Y, = X\ + X 2 X t + u u m = 12 (8.7.1) 

Time period 1982-1995: Y, = y j + y 2 X t + u 2 , n 2 = 14 (8.7.2) 

Time period 1970-1995: Y t = a i + a 2 X t + u t n = (m + n 2 ) = 26 (8.7.3) 

Regression (8.7.3) assumes that there is no difference between the two time periods and 
therefore estimates the relationship between savings and DPI for the entire time period con¬ 
sisting of 26 observations. In other words, this regression assumes that the intercept as well 
as the slope coefficient remains the same over the entire period; that is, there is no structural 
change. If this is in fact the situation, then ot\ = X\ = y\ and a 2 = X 2 = y 2 . 

Regressions (8.7.1) and (8.7.2) assume that the regressions in the two time periods are 
different; that is, the intercept and the slope coefficients are different, as indicated by the 
subscripted parameters. In the preceding regressions, the w’s represent the error terms and 
the n’s represent the number of observations. 

For the data given in Table 8.9, the empirical counterparts of the preceding three regres¬ 
sions are as follows: 

T,= 1.0161 + 0.0803 X t 
t= (0.0873) (9.6015) (8.7.1a) 

R 2 = 0.9021 RSSi = 1785.032 df = 10 


% = 153.4947 + 0.0148X, 
t~ (4.6922) (1.7707) (8.7.2a) 

R 2 = 0.2971 RSS 2 = 10,005.22 df = 12 


Y t =62.4226 + 0.0376 X,+••• 
t= (4.8917) (8.8937) +■■• 

R 2 = 0.7672 RSS 3 = 23,248.30 df = 24 


(8.7.3a) 
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FIGURE 8.3 
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In the preceding regressions, RSS denotes the residual sum of squares, and the figures in 
parentheses are the estimated t values. 

A look at the estimated regressions suggests that the relationship between savings and 
DPI is not the same in the two subperiods. The slope in the preceding savings-income 
regressions represents the marginal propensity to save (MPS), that is, the (mean) change 
in savings as a result of a dollar’s increase in disposable personal income. In the period 
1970-1981 the MPS was about 0.08, whereas in the period 1982-1995 it was about 0.02. 
Whether this change was due to the economic policies pursued by President Reagan is hard 
to say. This further suggests that the pooled regression (8.7.3a)—that is, the one that pools 
all the 26 observations and runs a common regression, disregarding possible differences in 
the two subperiods—may not be appropriate. Of course, the preceding statements need to 
be supported by an appropriate statistical test(s). Incidentally, the scattergrams and the es¬ 
timated regression lines are as shown in Figure 8.3. 

Now the possible differences, that is, structural changes, may be caused by differences in 
the intercept or the slope coefficient or both. How do we find that out? A visual feeling about 
this can be obtained as shown in Figure 8.3. But it would be useful to have a formal test. 
This is where the Chow test comes in handy . 15 This test assumes that: 

1. «k ~ N{ 0, er 2 ) and w 2f ~ N( 0, a 2 ). That is, the error terms in the subperiod regres¬ 
sions are normally distributed with the same (homoscedastic) variance a 2 . 

2. The two error terms u\ t and w 2r are independently distributed. 

The mechanics of the Chow test are as follows: 

1. Estimate regression (8.7.3), which is appropriate if there is no parameter instability, 
and obtain RSS 3 with df = («i + « 2 — k), where k is the number of parameters estimated, 
2 in the present case. For our example RSS 3 = 23,248.30. We call RSS 3 the restricted 
residual sum of squares (RSSr) because it is obtained by imposing the restrictions that 

= y\ and a 2 = y 2 , that is, the subperiod regressions are not different. 

2. Estimate Eq. (8.7.1) and obtain its residual sum of squares, RSSi, with df = («i - k). 
In our example, RSSi = 1785.032 and df = 10. 

3. Estimate Eq. (8.7.2) and obtain its residual sum of squares, RSS 2 , with df = (« 2 — k). 
In our example, RSS 2 = 10,005.22 with df = 12. 

15 Gregory C. Chow, "Tests of Equality Between Sets of Coefficients in Two Linear Regressions," 
Econometrica, vol. 28, no. 3, 1960, pp. 591-605. 
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4. Since the two sets of samples are deemed independent, we can add RSSi and RSS 2 
to obtain what may be called the unrestricted residual sum of squares (RSSur), that is, 

RSSur — RSSi + RSS 2 with df = (n\ + n 2 — 2k) 


In the present case, 

RSSur = (1785.032 + 10,005.22) = 11,790.252 


5. Now the idea behind the Chow test is that if in fact there is no structural change 
(i.e., regressions [8.7.1] and [8.7.2] are essentially the same), then the RSSr and RSSur 
should not be statistically different. Therefore, if we form the following ratio: 


F _ (RSSr - RSS ur )/k _ 

(RSSur )/(»! + »2 - 2k) ~ [ *’ ( ' n+ ' ,2_2 * )] 


(8.7.4) 


then Chow has shown that under the null hypothesis the regressions (8.7.1) and (8.7.2) are 
(statistically) the same (i.e., no structural change or break) and the F ratio given above 
follows the F distribution with k and (m + « 2 — 2k) df in the numerator and denominator, 
respectively. 

6. Therefore, we do not reject the null hypothesis of parameter stability (i.e., no struc¬ 
tural change) if the computed F value in an application does not exceed the critical F value 
obtained from the F table at the chosen level of significance (or the p value). In this case we 
may be justified in using the pooled (restricted?) regression (8.7.3). Contrarily, if the com¬ 
puted F value exceeds the critical F value, we reject the hypothesis of parameter stability 
and conclude that the regressions (8.7.1) and (8.7.2) are different, in which case the pooled 
regression (8.7.3) is of dubious value, to say the least. 

Returning to our example, we find that 

(23,248.30 - 11,790.252)/2 
~~ (ll,790.252)/22 (8 . 7>5) 

= 10.69 


From the F tables, we find that for 2 and 22 df the 1 percent critical F value is 5.72. There¬ 
fore, the probability of obtaining an F value of as much as or greater than 10.69 is much 
smaller than 1 percent; actually the p value is only 0.00057. 

The Chow test therefore seems to support our earlier hunch that the savings-income 
relation has undergone a structural change in the United States over the period 1970-1995, 
assuming that the assumptions underlying the test are fulfilled. We will have more to say 
about this shortly. 

Incidentally, note that the Chow test can be easily generalized to handle cases of more 
than one structural break. For example, if we believe that the savings-income relation 
changed after President Clinton took office in January 1992, we could divide our sample 
into three periods: 1970-1981, 1982-1991, 1992-1995, and carry out the Chow test. Of 
course, we will have four RSS terms, one for each subperiod and one for the pooled data. 
But the logic of the test remains the same. Data through 2007 are now available to extend 
the last period to 2007. 

There are some caveats about the Chow test that must be kept in mind: 

1. The assumptions underlying the test must be fulfilled. For example, one should find 
out if the error variances in the regressions (8.7.1) and (8.7.2) are the same. We will discuss 
this point shortly. 
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2. The Chow test will tell us only if the two regressions (8.7.1) and (8.7.2) are different, 
without telling us whether the difference is on account of the intercepts, or the slopes, or 
both. But in Chapter 9, on dummy variables, we will see how we can answer this question. 

3. The Chow test assumes that we know the point(s) of structural break. In our exam¬ 
ple, we assumed it to be in 1982. However, if it is not possible to determine when the struc¬ 
tural change actually took place, we may have to use other methods. 16 

Before we leave the Chow test and our savings-income regression, let us examine one 
of the assumptions underlying the Chow test, namely, that the error variances in the two 
periods are the same. Since we cannot observe the true error variances, we can obtain their 
estimates from the RSS given in the regressions (8.7.1a) and (8.7.2a), namely, 


(8.7.6) 


RSS, = 1^22 
2 n 2 -2 14-2 


(8.7.7) 


Notice that, since there are two parameters estimated in each equation, we deduct 2 from 
the number of observations to obtain the df. Given the assumptions underlying the Chow 
test, of and of are unbiased estimators of the true variances in the two subperiods. As a 
result, if of = of, that is, the variances in the two subpopulations are the same (as assumed 
by the Chow test), then it can be shown that 


Or 2 Ah 2 ) 


(8.7.8) 


follows the F distribution with (n \ — k) and (n 2 — k ) df in the numerator and the denomi¬ 
nator, respectively, in our example k = 2, since there are only two parameters in each sub¬ 
regression. 

Of course, if erf = erf, the preceding F test reduces to computing 



(8.7.9) 


Note: By convention we put the larger of the two estimated variances in the numerator. (See 
Appendix A for the details of the F and other probability distributions.) 

Computing this F in an application and comparing it with the critical F value with the 
appropriate df, one can decide to reject or not reject the null hypothesis that the variances 
in the two subpopulations are the same. If the null hypothesis is not rejected, then one can 
use the Chow test. 

Returning to our savings-income regression, we obtain the following result: 


833.7683 

178.5032 


= 4.6701 


(8.7.10) 


Under the null hypothesis of equality of variances in the two subpopulations, this F value 
follows the F distribution with 12 and 10 df, in the numerator and denominator, respec¬ 
tively. {Note: We have put the larger of the two estimated variances in the numerator.) From 
the F tables in Appendix D, we see that the 5 and 1 percent critical F values for 12 and 


16 For a detailed discussion, see William H. Greene, Econometric Analysis, 4th ed., Prentice Hall, 
Englewood Cliffs, N], 2000, pp. 293-297. 
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10 df are 2.91 and 4.71, respectively. The computed F value is significant at the 5 percent 
level and is almost significant at the 1 percent level. Thus, our conclusion would be that the 
two subpopulation variances are not the same and, therefore, strictly speaking we should 
not use the Chow test. 

Our purpose here has been to demonstrate the mechanics of the Chow test, which is used 
popularly in applied work. If the error variances in the two subpopulations are het- 
eroscedastic, the Chow test can be modified. But the procedure is beyond the scope of this 
book. 17 

Another point we made earlier was that the Chow test is sensitive to the choice of the 
time at which the regression parameters might have changed. In our example, we assumed 
that the change probably took place in the recession year of 1982. If we had assumed it to 
be 1981, when Ronald Reagan began his presidency, we might have found the computed F 
value to be different. As a matter of fact, in Exercise 8.34 the reader is asked to check this out. 

If we do not want to choose the point at which the break in the underlying relationship 
might have occurred, we could choose alternative methods, such as the recursive residual 
test. We will take this topic up in Chapter 13, the chapter on model specification analysis. 

8.8 Prediction with Multiple Regression 

In Section 5.10 we showed how the estimated two-variable regression model can be used 
for (1) mean prediction, that is, predicting the point on the population regression function 
(PRF), as well as for (2) individual prediction, that is, predicting an individual value of Y 
given the value of the regressor X = X 0 , where X 0 is the specified numerical value of X. 

The estimated multiple regression too can be used for similar purposes, and the proce¬ 
dure for doing that is a straightforward extension of the two-variable case, except the for¬ 
mulas for estimating the variances and standard errors of the forecast value (comparable to 
Eqs. [5.10.2] and [5.10.6] of the two-variable model) are rather involved and are better han¬ 
dled by the matrix methods discussed in Appendix C. Of course, most standard regression 
packages can do this routinely, so there is no need to look up the matrix formulation. It is 
given in Appendix C for the benefit of the mathematically inclined students. This appen¬ 
dix also gives a fully worked out example. 

*8.9 The Troika of Hypothesis Tests: The Likelihood Ratio (LR), 
Wald (W), and Lagrange Multiplier (LM) Tests 18 

In this and the previous chapters we have, by and large, used the t, F, and chi-square tests 
to test a variety of hypotheses in the context of linear (in-parameter) regression models. But 
once we go beyond the somewhat comfortable world of linear regression models, we need 
a method(s) to test hypotheses that can handle regression models, linear or not. 

The well-known trinity of likelihood, Wald, and Lagrange multiplier tests can ac¬ 
complish this purpose. The interesting thing to note is that asymptotically (i.e., in large 


*Optional. 

17 For a discussion of the Chow test under heteroscedasticity, see William H. Greene, Econometric 
Analysis, 4th ed., Prentice Hall, Englewood Cliffs, NJ, 2000, pp. 292-293, and Adrian C. Darnell, 

A Dictionary of Econometrics, Edward Elgar, U.K., 1994, p. 51. 

18 For an accessible discussion, see A. Buse, "The Likelihood Ratio, Wald and Lagrange Multiplier Tests: 
An Expository Note," American Statistician, vol. 36, 1982, pp. 153-157. 
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samples) all three tests are equivalent in that the test statistic associated with each of these 
tests follows the chi-square distribution. 

Although we will discuss the likelihood ratio test in the appendix to this chapter, in 
general we will not use these tests in this textbook for the pragmatic reason that in small, or 
finite, samples, which is unfortunately what most researchers deal with, the F test that we 
have used so far will suffice. As Davidson and MacKinnon note: 

For linear regression models, with or without normal errors, there is of course no need to look 
at LM, W and LR at all, since no information is gained from doing so over and above what is 
already contained in F . 19 


8.10 Testing the Functional Form of Regression: Choosing 
between Linear and Log-Linear Regression Models 

The choice between a linear regression model (the regressand is a linear function of the 
regressors) or a log-linear regression model (the log of the regressand is a function of the 
logs of the regressors) is a perennial question in empirical analysis. We can use a test pro¬ 
posed by MacKinnon, White, and Davidson, which for brevity we call the MWD test, to 
choose between the two models. 20 

To illustrate this test, assume the following 

Hq. Linear Model: 7 is a linear function of regressors, the X’s. 

H x : Log-Linear Model: In 7 is a linear function of logs of regressors, the logs of X’s. 
where, as usual, Hq and H\ denote the null and alternative hypotheses. 

The MWD test involves the following steps: 21 

Step I: Estimate the linear model and obtain the estimated 7values. Call them Yf( i.e., 7). 
Step: II: Estimate the log-linear model and obtain the estimated In 7 values; call them 
ln/(i.e., In 7). 

Step IU: Obtain Z x = (In 7/- In/). 

Step IV: Regress 7 on X’s and Z\ obtained in Step III. Reject Hq if the coefficient of 
Z x is statistically significant by the usual t test. 

Step V: Obtain Z 2 = (antilog of In/— Yf). 

Step VI: Regress log of 7 on the logs of Xs and Z 2 . Reject H x if the coefficient of Z 2 
is statistically significant by the usual t test. 

Although the MWD test seems involved, the logic of the test is quite simple. If the linear 
model is in fact the correct model, the constructed variable Z x should not be statistically sig¬ 
nificant in Step IV, for in that case the estimated 7 values from the linear model and those 
estimated from the log-linear model (after taking their antilog values for comparative pur¬ 
poses) should not be different. The same comment applies to the alternative hypothesis H x . 

'Optional. 

19 Russell Davidson and James C. MacKinnon, Estimation and Inference in Econometrics, Oxford Univer¬ 
sity Press, New York, 1993, p. 456. 

20 J. MacKinnon, H. White, and R. Davidson, "Tests for Model Specification in the Presence of Alterna¬ 
tive Hypothesis; Some Further Results," journal of Econometrics, vol. 21,1983, pp. 53-70. A similar 
test is proposed in A. K. Bera and C. M. Jarque, "Model Specification Tests: A Simultaneous Approach," 
journal of Econometrics, vol. 20, 1982, pp. 59-82. 

21 This discussion is based on William H. Greene, ET. The Econometrics Toolkit Version 3, Econometric 
Software, Bellport, New York, 1992, pp. 245-246. 
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EXAMPLE 8.5 

The Demand for 
Roses 


Refer to Exercise 7.16 where we have presented data on the demand for roses in the 
Detroit metropolitan area for the period 1971-111 to 1975-11. For illustrative purposes, we 
will consider the demand for roses as a function only of the prices of roses and carna¬ 
tions, leaving out the income variable for the time being. Now we consider the follow¬ 
ing models: 

Linear model: y t = ai + a 2 X 2t + a 3 X 3t +u t (8.10.1) 

Log-linear model: In Y t = fti + /S 2 lnX 2t + /S 3 lnX 3t + u t (8.10.2) 

where Y is the quantity of roses in dozens, X 2 is the average wholesale price of roses 
($/dozen), and X 3 is the average wholesale price of carnations ($/dozen). A priori, a 2 
and p 2 are expected to be negative (why?), and a 3 and p 3 are expected to be positive 
(why?). As we know, the slope coefficients in the log-linear model are elasticity 
coefficients. 

The regression results are as follows: 
y t = 9734.2176 - 3782.1956X 2t +2815.2515X 3t 
t = (3.3705) (-6.6069) (2.9712) (8.10.3) 

F= 21.84 R 2 = 0.77096 
IrTVt = 9.2278 - 1.7607 lnX 2t + 1.3398 lnX 3t 

t= (16.2349) (-5.9044) (2.5407) (8.10.4) 

F = 17.50 R 2 = 0.7292 

As these results show, both the linear and the log-linear models seem to fit the data rea¬ 
sonably well: The parameters have the expected signs and the t and R 2 values are statisti¬ 
cally significant. 

To decide between these models on the basis of the MWD test, we first test the hy¬ 
pothesis that the true model is linear. Then, following Step IV of the test, we obtain the 
following regression: 

=9727.5685 - 3783.0623X 2t + 2817.7157X 3t + 85.2319Z U 
t= (3.2178) (-6.3337) (2.8366) (0.0207) (8.10.5) 

F = 13.44 R 2 = 0.7707 

Since the coefficient of Zt is not statistically significant (the p value of the estimated t is 
0.98), we do not reject the hypothesis that the true model is linear. 

Suppose we switch gears and assume that the true model is log-linear. Following step 
VI of the MWD test, we obtain the following regression results: 

In7 t = 9.1486 - 1.9699 In X t + 1.5891 In X 2t - 0.001 3Z 2t 

t= (17.0825) (-6.4189) (3.0728) (-1.6612) (8.10.6) 

F= 14.17 R 2 = 0.7798 

The coefficient of Z 2 is statistically significant at about the 12 percent level (p value is 
0.1225). Therefore, we can reject the hypothesis that the true model is log-linear at this 
level of significance. Of course, if one sticks to the conventional 1 or 5 percent signifi¬ 
cance levels, then one cannot reject the hypothesis that the true model is log-linear. As 
this example shows, it is quite possible that in a given situation we cannot reject either 
of the specifications. 
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Summary and 
Conclusions 


EXERCISES 


1. This chapter extended and refined the ideas of interval estimation and hypothesis testing 
first introduced in Chapter 5 in the context of the two-variable linear regression model. 

2. In a multiple regression, testing the individual significance of a partial regression coef¬ 
ficient (using the t test) and testing the overall significance of the regression (i.e., H 0 : all 
partial slope coefficients are zero or R 2 — 0) are not the same thing. 

3. In particular, the finding that one or more partial regression coefficients are statistically 
insignificant on the basis of the individual t test does not mean that all partial regression 
coefficients are also (collectively) statistically insignificant. The latter hypothesis can be 
tested only by the F test. 

4. The F test is versatile in that it can test a variety of hypotheses, such as whether (1) an 
individual regression coefficient is statistically significant, (2) all partial slope coeffi¬ 
cients are zero, (3) two or more coefficients are statistically equal, (4) the coefficients 
satisfy some linear restrictions, and (5) there is structural stability of the regression 
model. 

5. As in the two-variable case, the multiple regression model can be used for the purpose 
of mean and/or individual prediction. 


Questions 

8.1. Suppose you want to study the behavior of sales of a product, say, automobiles over 
a number of years and suppose someone suggests you try the following models: 

Y t = P o + ft t 
Y, = a 0 + a\t -\-a2t 1 

where Y t — sales at time t and t — time, measured in years. The first model postu¬ 
lates that sales is a linear function of time, whereas the second model states that it is 
a quadratic function of time. 

a. Discuss the properties of these models. 

b. How would you decide between the two models? 

c. In what situations will the quadratic model be useful? 

d. Try to obtain data on automobile sales in the United States over the past 20 years 
and see which of the models fits the data better. 

8.2. Show that the F ratio ofEq. (8.4.16) is equal to the F ratio ofEq. (8.4.18). (Hint: 
ESS/TSS = R 2 .) 

8.3. Show that F tests ofEq. (8.4.18) and Eq. (8.6.10) are equivalent. 

8.4. Establish statements (8.6.11) and (8.6.12). 

8.5. Consider the Cobb-Douglas production function 

Y = PiL^Rfe (1) 

where Y — output, L — labor input, and K = capital input. Dividing (1) through by 
K, we get 

(Y/K) = p x (L/K)^K^+^- x (2) 

Taking the natural log of (2) and adding the error term, we obtain 

In (Y/K) = ft + ft In (L/K) + (ft + ft - 1) In K + Ui (3) 
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where ft — In ft. 

a. Suppose you had data to run the regression (3). How would you test the hypothe¬ 
sis that there are constant returns to scale, i.e., (ft + ft) = 1? 

b. If there are constant returns to scale, how would you interpret regression (3)? 

c. Does it make any difference whether we divide (1) by L rather than by K1 

8.6. Critical values of R 2 when true R 2 = 0. Equation (8.4.11) gave the relationship be¬ 
tween F and R 2 under the hypothesis that all partial slope coefficients are simultane¬ 
ously equal to zero (i.e., R 2 = 0). Just as we can find the critical F value at the a level 
of significance from the F table, we can find the critical R 2 value from the following 
relation: 


( k - 1 )F + (n — k) 

where k is the number of parameters in the regression model including the intercept 
and where F is the critical F value at the a level of significance. If the observed R 2 
exceeds the critical R 2 obtained from the preceding formula, we can reject the 
hypothesis that the true R 2 is zero. 

Establish the preceding formula and find out the critical R 2 value (at a = 5 per¬ 
cent) for the regression (8.1.4). 

8.7. From annual data for the years 1968-1987, the following regression results were 
obtained: 

Y t = -859.92 + 0.6470X 2 ,- 23.195X 3( R 2 = 0.9776 (1) 

Y, = -261.09 + 0.2452^ R 2 = 0.9388 (2) 

where Y = U.S. expenditure on imported goods, billions of 1982 dollars, X 2 = per¬ 
sonal disposable income, billions of 1982 dollars, and X 3 = trend variable. True or 
false: The standard error of ft in (1) is 4.2750. Show your calculations. (Hint: Use 
the relationship between R 2 , F, and t.) 

8.8. Suppose in the regression 

In (Yi/X 2 i) = «i + «2 In X 2i + a 3 In X 3l + u, 

the values of the regression coefficients and their standard errors are known.* From 
this knowledge, how would you estimate the parameters and standard errors of the 
following regression model? 

In Y, —- /J| + ft In X 2 i -f* ft In X 2i + u, 

8.9. Assume the following: 

I * ft + ftu 2i + ftX 3 , + ftX 2i X 3i + u, 

where Y is personal consumption expenditure, X 2 is personal income, and X 2 is per¬ 
sonal wealth. 1 ' The term (X 2i Xn) is known as the interaction term. What is meant 
by this expression? How would you test the hypothesis that the marginal propensity 
to consume (MPC) (i.e., ft) is independent of the wealth of the consumer? 


‘Adapted from Peter Kennedy, A Guide to Econometrics, the MIT Press, 3d ed., Cambridge, Mass., 
1992, p. 310. 
hbid., p. 327. 




264 Part One Single-Equation Regression Models 


8.10. You are given the following regression results: 

Y, = 16,899 - 2978.5X 2f R 2 = 0.6149 

t m (8.5152) (-4.7280) 

Y,= 9734.2 — 3782.2X2, + 28l5X 3t R 2 = 0.7706 

t = (3.3705) (-6.6070) (2.9712) 

Can you find out the sample size underlying these results? (Hint: Recall the relation¬ 
ship between R 2 , F, and t values.) 

8.11. Based on our discussion of individual and joint tests of hypothesis based, respec¬ 
tively, on the t and F tests, which of the following situations are likely? 

1. Reject the joint null on the basis of the F statistic, hut do not reject each separate 
null on the basis of the individual t tests. 

2. Reject the joint null on the basis of the F statistic, reject one individual hypothe¬ 
sis on the basis of the t test, and do not reject the other individual hypotheses on 
the basis of the t test. 

3. Reject the joint null hypothesis on the basis of the F statistic, and reject each sep¬ 
arate null hypothesis on the basis of the individual t tests. 

4. Do not reject the joint null on the basis of the F statistic, and do not reject each 
separate null on the basis of individual t tests. 

5. Do not reject the joint null on the basis of the F statistic, reject one individual hy¬ 
pothesis on the basis of a t test, and do not reject the other individual hypotheses 
on the basis of the t test. 

6. Do not reject the joint null on the basis of the F statistic, but reject each separate 
null on the basis of individual t tests.* 

Empirical Exercises 

8.12. Refer to Exercise 7.21. 

a. What are the real income and interest rate elasticities of real cash balances? 

b. Are the preceding elasticities statistically significant individually? 

c. Test the overall significance of the estimated regression. 

d. Is the income elasticity of demand for real cash balances significantly different 
from unity? 

e. Should the interest rate variable be retained in the model? Why? 

8.13. From the data for 46 states in the United States for 1992, Baltagi obtained the 

following regression results: 1 ' 

logC = 4.30 - 1.34 log P+ 0.17 log Y 

se = (0.91) (0.32) (0.20) R 2 = 0.27 

where C = cigarette consumption, packs per year 
P — real price per pack 
Y — real disposable income per capita 


‘Quoted from Ernst R. Berndt, The Practice of Econometrics: Classic and Contemporary, Addison-Wesley, 
Reading, Mass., 1991, p. 79. 

^See Badi H. Baltagi, Econometrics, Springer-Verlag, New York, 1998, p. 111. 
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a. What is the elasticity of demand for cigarettes with respect to price? Is it statisti¬ 
cally significant? If so, is it statistically different from 1? 

b. What is the income elasticity of demand for cigarettes? Is it statistically signifi¬ 
cant? If not, what might be the reasons for it? 

c. How would you retrieve R 2 from the adjusted R 2 given above? 

8.14. From a sample of 209 firms, Wooldridge obtained the following regression results:* 

log (salary) = 4.32 + 0.280 log (sales) + 0.0174 roe + 0.00024 ros 
se = (0.32) (0.035) (0.0041) (0.00054) 

R 2 = 0.283 

where salary = salary of CEO 

sales = annual firm sales 
roe = return on equity in percent 
ros = return on firm’s stock 

and where figures in the parentheses are the estimated standard errors. 

a. Interpret the preceding regression taking into account any prior expectations that 
you may have about the signs of the various coefficients. 

b. Which of the coefficients are individually statistically significant at the 5 percent 
level? 

c. What is the overall significance of the regression? Which test do you use? 
And why? 

d. Can you interpret the coefficients of roe and ros as elasticity coefficients? Why or 
why not? 

8.15. Assuming that Y and X2, X3,..., are jointly normally distributed and assuming 
that the null hypothesis is that the population partial correlations are individually 
equal to zero, R. A. Fisher has shown that 

_ n2.34..Ww - k-2 
\A — r l2.34...k 

follows the t distribution with n — k — 2 df, where k is the Mi-order partial correla¬ 
tion coefficient and where n is the total number of observations. {Note: r \23 is a first- 
order partial correlation coefficient, r\ 2.34 is a second-order partial correlation 
coefficient, and so on.) Refer to Exercise 7.2. Assuming Y and X2 and X3 to be 
jointly normally distributed, compute the three partial correlations r 12.3, it 3.2, and 
r23.i and test their significance under the hypothesis that the corresponding popula¬ 
tion correlations are individually equal to zero. 

8.16. In studying the demand for farm tractors in the United States for the periods 
1921-1941 and 1948-1957, Griliches'* obtained the following results: 

logF, = constant - 0.519 log X 2t - 4.933 log X 3 , R 2 = 0.793 
(0.231) (0.477) 

‘See Jeffrey M. Wooldridge, Introductory Econometrics, South-Western Publishing Co., 2000, 
pp. 154-155. 

1Z. Griliches, "The Demand fora Durable Input: Farm Tractors in the United States, 1921-1957," in 
The Demand for Durable Goods, Arnold C. Harberger (ed.), The University of Chicago Press, Chicago, 
1960, Table 1. d. 192. 
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where Y t = value of stock of tractors on farms as of January l,in 1935-1939 dollars, 
X 2 — index of prices paid for tractors divided by an index of prices received for all 
crops at time t — l,and X 3 = interest rate prevailing in year t — 1. The estimated 
standard errors are given in the parentheses. 

a. Interpret the preceding regression. 

b. Are the estimated slope coefficients individually statistically significant? Are they 
significantly different from unity? 

c. Use the analysis of variance technique to test the significance of the overall re¬ 
gression. Hint: Use the R 2 variant of the ANOVA technique. 

d. How would you compute the interest-rate elasticity of demand for farm tractors? 

e. How would you test the significance of estimated R 2 ! 

8.17. Consider the following wage-determination equation for the British economy* for 
the period 1950-1969: 

W, = 8.582 + 0.364(PF), + 0.004(PF) f _! - 2.560(7, 

(1.129) (0.080) (0.072) (0.658) 

R 2 = 0.873 df = 15 

where W = wages and salaries per employee 
PF = prices of final output at factor cost 
U = unemployment in Great Britain as a percentage of the total number of 
employees in Great Britain 
t = time 

(The figures in the parentheses are the estimated standard errors.) 

a. Interpret the preceding equation. 

b. Are the estimated coefficients individually significant? 

c. What is the rationale for the introduction of (PF),_i? 

d. Should the variable (PF),_, be dropped from the model? Why? 

e. How would you compute the elasticity of wages and salaries per employee with 
respect to the unemployment rate If! 

8.18. A variation of the wage-determination equation given in Exercise 8.17 is as follows: 1 ' 

W,= 1.073 + 5.288 F, — 0.11627,+ 0.054A/,+ 0.046M,_-| 

(0.797) (0.812) (0.111) (0.022) (0.019) 

R 2 = 0.934 df= 14 

where W = wages and salaries per employee 

V = unfilled job vacancies in Great Britain as a percentage of the total 
number of employees in Great Britain 
X = gross domestic product per person employed 
M — import prices 

M t - 1 = import prices in the previous (or lagged) year 
(The estimated standard errors are given in the parentheses.) 


‘Taken from Prices and Earnings in 1951-1969: An Econometric Assessment, Dept, of Employment, 
HMSO, 1971, Eq. (19), p. 35. 

%id„ Eq. (67), p. 37. 
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a. Interpret the preceding equation. 

b. Which of the estimated coefficients are individually statistically significant? 

c. What is the rationale for the introduction of the X variable? A priori, is the sign of 
X expected to be negative? 

d. What is the purpose of introducing both M, and M t _\ in the model? 

e. Which of the variables may be dropped from the model? Why? 
f Test the overall significance of the observed regression. 

8.19. For the demand for chicken function estimated in Eq. (8.6.24), is the estimated 
income elasticity equal to 1? Is the price elasticity equal to -1? 

8.20. For the demand function in Eq. (8.6.24) how would you test the hypothesis that the 
income elasticity is equal in value but opposite in sign to the price elasticity of 
demand? Show the necessary calculations. {Note: cov [fo, 3] = —0.00142.) 

8.21. Refer to the demand for roses function of Exercise 7.16. Confining your considera¬ 
tions to the logarithmic specification, 

a. What is the estimated own-price elasticity of demand (i.e., elasticity with respect 
to the price of roses)? 

b. Is it statistically significant? 

c. If so, is it significantly different from unity? 

d. A priori, what are the expected signs of X 3 (price of carnations) and X 4 (income)? 
Are the empirical results in accord with these expectations? 

e. If the coefficients of X 3 and X4 are statistically insignificant, what may be the 
reasons? 

8.22. Refer to Exercise 7.17 relating to wildcat activity. 

a. Is each of the estimated slope coefficients individually statistically significant at 
the 5 percent level? 

b. Would you reject the hypothesis that R 2 = 0? 

c. What is the instantaneous rate of growth of wildcat activity over the period 
1948-1978? The corresponding compound rate of growth? 

8.23. Refer to the U.S. defense budget outlay regression estimated in Exercise 7.18. 

a. Comment generally on the estimated regression results. 

b. Set up the ANOVA table and test the hypothesis that all the partial slope coeffi¬ 
cients are zero. 

8.24. The following is known as the transcendental production function (TPF), a gener¬ 
alization of the well-known Cobb-Douglas production function: 

Yi = 

where Y — output, L — labor input, and K — capital input. 

After taking logarithms and adding the stochastic disturbance term, we obtain the 
stochastic TPF as 

In Yi = /Jo + f $2 hr Li + fi 3 In K, + K, + w, 

where fo = In fi\. 

a. What are the properties of this function? 

b. For the TPF to reduce to the Cobb-Douglas production function, what must be the 
values of and ft? 
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c. If you had the data, how would you go about finding out whether the TPF red¬ 
uces to the Cobb-Douglas production function? What testing procedure would 
you use? 

d. See if the TPF fits the data given in Table 8.8. Show your calculations. 

8.25. Energy prices and capital formation: United States, 1948-1978. To test the hypo¬ 
thesis that a rise in the price of energy relative to output leads to a decline in the 
productivity of existing capital and labor resources, John A. Tatom estimated the 
following production function for the United States for the quarterly period 1948-1 
to 1978-11:* 


\n(y/k)= 1.5492+ 0.7135 1 

(16.33) (21.69) 

+ 0.0045? 

(15.86) 


0.7135 In (h/k)- 0.1081 \n(P e /P) 


(-6.42) 


R 2 = 0.98 


where y = real output in the private business sector 
k = a measure of the flow of capital services 
h = person hours in the private business sector 
P e = producer price index for fuel and related products 
P = private business sector price deflator 
t = time 

The numbers in parentheses are t statistics. 

a. Do the results support the author’s hypothesis? 

b. Between 1972 and 1977 the relative price of energy, (Pe/P), increased by 60 per¬ 
cent. From the estimated regression, what is the loss in productivity? 

c. After allowing for the changes in (h/k) and (P e /P), what has been the trend rate 
of growth of productivity over the sample period? 

d. How would you interpret the coefficient value of 0.7135? 

e. Does the fact that each estimated partial slope coefficient is individually statisti¬ 
cally significant (why?) mean we can reject the hypothesis that R 2 — 0? Why or 
why not? 

8.26. The demand for cable. Table 8.10 gives data used by a telephone cable manufacturer 
to predict sales to a major customer for the period 1968-1983.^ 

The variables in the table are defined as follows: 

Y = annual sales in MPF, million paired feet 
Xi = gross national product (GNP), $, billions 
X'i = housing starts, thousands of units 
X4 = unemployment rate, % 

X 5 = prime rate lagged 6 months 
X 6 = Customer line gains, % 


'See his "Energy Prices and Capital Formation: 1972-1977," Review, Federal Reserve Bank of St. Louis, 
vol. 61, no. 5, May 1979, p. 4. 

t| am indebted to Daniel J. Reardon for collecting and processing the data. 
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TABLE 8.10 Regression Variables 




*3, 

X 4 , 

Xs, 

X 6 , 

Y, 


*2, 

Housing 

Unemployment, 

Prime Rate 

Customer Line 

Annual 

Year 

GNP 

Starts 

% 

Lag, 6 mos. 

Gains, % 

Sales (MPF) 

1968 

1051.8 

1503.6 

3.6 

5.8 

5.9 

5873 

1969 

1078.8 

1486.7 

3.5 

6.7 

4.5 

7852 

1970 

1075.3 

1434.8 

5.0 

8.4 

4.2 

8189 

1971 

1107.5 

2035.6 

6.0 

6.2 

4.2 

7497 

1972 

1171.1 

2360.8 

5.6 

5.4 

4.9 

8534 

1973 

1235.0 

2043.9 

4.9 

5.9 

5.0 

8688 

1974 

1217.8 

1331.9 

5.6 

9.4 

4.1 

7270 

1975 

1202.3 

1160.0 

8.5 

9.4 

3.4 

5020 

1976 

1271.0 

1535.0 

7.7 

7.2 

4.2 

6035 

1977 

1332.7 

1961.8 

7.0 

6.6 

4.5 

7425 

1978 

1399.2 

2009.3 

6.0 

7.6 

3.9 

9400 

1979 

1431.6 

1 721.9 

6.0 

10.6 

4.4 

9350 

1980 

1480.7 

1298.0 

7.2 

14.9 

3.9 

6540 

1981 

1510.3 

1100.0 

7.6 

16.6 

3.1 

7675 

1982 

1492.2 

1039.0 

9.2 

17.5 

0.6 

7419 

1983 

1535.4 

1200.0 

8.8 

16.0 

1.5 

7923 


You are to consider the following model: 

Yi = ft i + p 2 X 2t + P3X3 1 + P4X4 1 + P3X3 1 + p 6 X 6 , + u, 

a. Estimate the preceding regression. 

b. What are the expected signs of the coefficients of this model? 

c. Are the empirical results in accordance with prior expectations? 

d. Are the estimated partial regression coefficients individually statistically signifi¬ 
cant at the 5 percent level of significance? 

e. Suppose you first regress Y on X 2 , X3, and X 4 only and then decide to add the vari¬ 
ables X5 and Xf,. How would you find out if it is worth adding the variables X$ and 
X 6 ? Which test do you use? Show the necessary calculations. 

8.27. Marc Nerlove has estimated the following cost function for electricity generation:* 
Y = AX& P ai P a2 P a, u (1) 

where Y — total cost of production 
X — output in kilowatt hours 
Pi = price of labor input 
P 2 = price of capital input 
P 3 = price of fuel 
u — disturbance term 


‘Marc Nerlove, "Returns to Scale in Electric Supply," in Carl Christ, ed., Measurement in Economics, 
Stanford University Press, Palo Alto, Calif., 1963. The notation has been changed. 
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Theoretically, the sum of the price elasticities is expected to be unity, i.e., 
(cti + «2 + «3) = 1. By imposing this restriction, the preceding cost function can be 
written as 


(17*0 = AXHPxlPi) a '{p2/Pz) ai u (2) 

In other words, (1) is an unrestricted and (2) is the restricted cost function. 

On the basis of a sample of 29 medium-sized firms, and after logarithmic trans¬ 
formation, Nerlove obtained the following regression results: 

1117;=-4.93 + 0.94 In X t + 0.31 In Pi 

se = (1.96) (0.11) (0.23) 

—0.26 lnP 2 + 0.44 In P 3 
(0.29) (0.07) 

MY/P3) = -6.55 + 0.91 lnX+ 0.51 ln(/yP 3 )+ 0.09 ln(P 2 /P 3 ) 

se = (0.16) (0.11) (0.19) (0.16) RSS = 0.364 

(4) 


(3) 

RSS = 0.336 


a. Interpret Eqs. (3) and (4). 

b. How would you find out if the restriction (a\ + a 2 + a 3 ) = 1 is valid? Show your 
calculations. 

8.28. Estimating the capital asset pricing model (CAPM). In Section 6.1 we considered 
briefly the well-known capital asset pricing model of modem portfolio theory. In em¬ 
pirical analysis, the CAPM is estimated in two stages. 

Stage I (Time-series regression). For each of the N securities included in the 
sample, we run the following regression over time: 

R it = at + f}iR mt + e it (1) 

where R lt and R mt are the rates of return on the ith security and on the market portfo¬ 
lio (say, the S&P 500) in year f; p t , as noted elsewhere, is the Beta or market volatil¬ 
ity coefficient of the ith security, and e lt are the residuals. In all there are N such 
regressions, one for each security, giving therefore N estimates of Pi . 

Stage II (Cross-section regression). In this stage we run the following regression 
over the N securities: 


Ri = Y\ +faPi +Ui (2) 

where R, is the average or mean rate of return for security i computed over the sam¬ 
ple period covered by Stage I, Pi is the estimated beta coefficient from the first-stage 
regression, and u, is the residual term. 

Comparing the second-stage regression (2) with the CAPM Eq. (6.1.2), written as 

ER, =r f + pi( ER m - r f ) (3) 

where rj is the risk-free rate of return, we see that y\ is an estimate of rf and p 2 is 
an estimate of (ER m — rf), the market risk premium. 
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Thus, in the empirical testing of CAPM, R, and P, are used as estimators of ER, 
and ^, respectively. Now if CAPM holds, statistically, 

#1 =r/ 

y 2 = R m — rf, the estimator of (ER m — rj) 

Next consider an alternative model: 

R, = Y\+ faPi + h s \ + u i ( 4 ) 

where s 2 is the residual variance of the /th security from the first-stage regression. 
Then, if CAPM is valid, j/3 should not be significantly different from zero. 

To test the CAPM, Levy ran regressions (2) and (4) on a sample of 101 stocks for 
the period 1948-1968 and obtained the following results:* 

Ri = 0.109 + 0.037)6; 

(0.009) (0.008) (2)' 

t = (12.0) (5.1) R 2 = 0.21 

Ri = 0.106 + 0.0024 ft + 0.201 s 2 

(0.008) (0.007) (0.038) (4)' 

t = (13.2) (3.3) (5.3) R 2 = 0.39 

a. Are these results supportive of the CAPM? 

b. Is it worth adding the variable s 2 to the model? How do you know? 

c. If the CAPM holds, y\ in (2)' should approximate the average value of the risk¬ 
free rate, rf. The estimated value is 10.9 percent. Does this seem a reasonable 
estimate of the risk-free rate of return during the observation period, 1948-1968? 
(You may consider the rate of return on Treasury bills or a similar comparatively 
risk-free asset.) 

d. If the CAPM holds, the market risk premium (R m — rf) from (2)' is about 
3.7 percent. If rf is assumed to be 10.9 percent, this implies R m for the sample 
period was about 14.6 percent. Does this sound like a reasonable estimate? 

e. What can you say about the CAPM generally? 

8.29. Refer to Exercise 7.21c. Now that you have the necessary tools, which test(s) would 
you use to choose between the two models? Show the necessary computations. Note 
that the dependent variables in the two models are different. 

8.30. Refer to Example 8.3. Use the t test as shown in Eq. (8.6.4) to find out if there were 
constant returns to scale in the Mexican economy for the period of the study. 

8.31. Return to the child mortality example that we have discussed several times. In 
regression (7.6.2) we regressed child mortality (CM) on per capita GNP (PGNP) 
and female literacy rate (FLR). Now we extend this model by including total 


*H. Levy, "Equilibrium in an Imperfect Market: A Constraint on the Number of Securities in the Portfolio," 
American Economic Review, vol. 68, no. 4, September 1978, pp. 643-658. 
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fertility rate (TFR). The data on all these variables are already given in Table 6.4. We 
reproduce regression (7.6.2) and give results of the extended regression model 
below: 

1. CMt= 263.6416 - 0.0056 PGNP;- 2.2316 FLR; (7.6.2) 

se= (11.5932) (0.0019) (0.2099) R 2 = 0.7077 

2. CM; = 168.3067 - 0.0055 PGNP, - 1.7680 FLR, + 12.8686TFR, 

se = (32.8916) (0.0018) (0.2480) (?) 

R 2 = 0.7474 

a. How would you interpret the coefficient of TFR? A priori, would you expect a 
positive or negative relationship between CM and TFR? Justify your answer. 

b. Have the coefficient values of PGNP and FR changed between the two equations? 
If so, what may be the reason(s) for such a change? Is the observed difference sta¬ 
tistically significant? Which test do you use and why? 

c. How would you choose between models 1 and 2? Which statistical test would you 
use to answer this question? Show the necessary calculations. 

d. We have not given the standard error of the coefficient of TFR. Can you find it 
out? {Hint: Recall the relationship between the t and F distributions.) 

8.32. Return to Exercise 1.7, which gave data on advertising impressions retained and 
advertising expenditure for a sample of 21 firms. In Exercise 5.11 you were asked to 
plot these data and decide on an appropriate model about the relationship between 
impressions and advertising expenditure. Letting Y represent impressions retained 
and X the advertising expenditure, the following regressions were obtained: 

Model I: % = 22.163 + 0.363 \X t 

se= (7.089) (0.0971) r 2 = 0.424 

Model II: % = 7.059 + 1.0847W;- 0.0040Y? 

se = (9.986) (0.3699) (0.0019) R 2 — 0.53 

a. Interpret both models. 

b. Which is a better model? Why? 

c. Which statistical test(s) would you use to choose between the two models? 

d. Are there “diminishing returns” to advertising expenditure, that is, after a certain 
level of advertising expenditure (the saturation level), does it not pay to advertise? 
Can you find out what that level of expenditure might be? Show the necessary cal¬ 
culations. 

8.33. In regression (7.9.4), we presented the results of the Cobb-Douglas production func¬ 
tion fitted to the manufacturing sector of all 50 states and Washington, DC, for 2005. 
On the basis of that regression, find out if there are constant returns to scale in that 
sector, using 

a. The t test given in Eq. (8.6.4). You are told that the covariance between the two 
slope estimators is -0.03843. 

b. The F test given in Eq. (8.6.9). 

c. Is there a difference in the two test results? And what is your conclusion regard¬ 
ing the returns to scale in the manufacturing sector of the 50 states and 
Washington, DC, over the sample period? 



Chapter 8 Multiple Regression Analysis: The Problem of Inference 273 


8.34. Reconsider the savings-income regression in Section 8.7. Suppose we divide the 
sample into two periods as 1970-1982 and 1983-1995. Using the Chow test, decide 
if there is a structural change in the savings-income regression in the two periods. 
Comparing your results with those given in Section 8.7, what overall conclusion do 
you draw about the sensitivity of the Chow test to the choice of the break point that 
divides the sample into two (or more) periods? 

8.35. Refer to Exercise 7.24 and the data in Table 7.12 concerning four economic variables 
in the U.S. from 1947-2000. 

a. Based on the regression of consumption expenditure on real income, real 
wealth and real interest rate, find out which of the regression coefficients are 
individually statistically significant at the 5 percent level of significance. Are the 
signs of the estimated coefficients in accord with economic theory? 

b. Based on the results in (a), how would you estimate the income, wealth, and 
interest rate elasticities? What additional information, if any, do you need to com¬ 
pute the elasticities? 

c. How would you test the hypothesis that the income and wealth elasticities are the 
same? Show the necessary calculations. 

d. Suppose instead of the linear consumption function estimated in (a), you regress 
the logarithm of consumption expenditure on the logarithms of income and 
wealth and the interest rate. Show the regression results. How would you interpret 
the results? 

e. What are the income and wealth elasticities estimated in (<7)? How would you 
interpret the coefficient of the interest rate estimated in (d)'! 

f In the regression in (d) could you have used the logarithm of the interest rate 
instead of the interest rate? Why or why not? 

g. How would you compare the elasticities estimated in (b) and in (d)‘! 

h. Between the regression models estimated in (a) and ( d ), which would you 
prefer? Why? 

i. Suppose instead of estimating the model given in (d), you only regress the loga¬ 
rithm of consumption expenditure on the logarithm of income. How would you 
decide if it is worth adding the logarithm of wealth in the model? And how would 
you decide if it is worth adding both the logarithm of wealth and interest rate vari¬ 
ables in the model? Show the necessary calculations. 

8.36. Refer to Section 8.8 and the data in Table 8.9 concerning disposable personal income 
and personal savings for the period 1970-1995. In that section, the Chow test was 
introduced to see if a structural change occurred within the data between two time 
periods. Table 8.11 includes updated data containing the values from 1970-2005. 
According to the National Bureau of Economic Research, the most recent U.S. busi¬ 
ness contraction cycle ended in late 2001. Split the data into three sections: 
(1) 1970-1981, (2) 1982-2001, and (3) 2002-2005. 

a. Estimate both the model for the full dataset (years 1970-2005) and the third 
section (post-2002). Using the Chow test, determine if there is a significant break 
between the third period and the full dataset. 

b. With this new data in Table 8.11, determine if there is still a significant difference 
between the first set of years (1970-1981) and the full dataset, now that there are 
more observations available. 

c. Perform the Chow test on the middle period (1982-2001) versus the full dataset to 
see if the data in this period behave significantly differently than the rest of the data. 
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TABLE 8.11 

Savings and Personal 

Year 

Savings 

Income 

Disposable Income 

1970 

69.5 

735.7 

(billions of dollars), 

1971 

80.6 

801.8 

United States, 

1972 

77.2 

869.1 

1970-2005 (billions of 

1973 

102.7 

978.3 

dollars, except as 

1974 

113.6 

1,071.6 

noted; quarterly data 

1975 

125.6 

1,187.4 

at seasonally adjusted 

1976 

122.3 

1,302.5 

annual rates) 

1977 

125.3 

1,435.7 

1978 

142.5 

1,608.3 

Source: Department of 

1979 

159.1 

1,793.5 

Fr.nnomtr, Analysis 

1980 

201.4 

2,009.0 


1981 

244.3 

2,246.1 


1982 

270.8 

2,421.2 


1983 

233.6 

2,608.4 


1984 

314.8 

2,912.0 


1985 

280.0 

3,109.3 


1986 

268.4 

3,285.1 


1987 

241.4 

3,458.3 


1988 

272.9 

3,748.7 


1989 

287.1 

4,021.7 


1990 

299.4 

4,285.8 


1991 

324.2 

4,464.3 


1992 

366.0 

4,751.4 


1993 

284.0 

4,911.9 


1994 

249.5 

5,151.8 


1995 

250.9 

5,408.2 


1996 

228.4 

5,688.5 


1997 

218.3 

5,988.8 


1998 

276.8 

6,395.9 


1999 

158.6 

6,695.0 


2000 

168.5 

7,194.0 


2001 

132.3 

7,486.8 


2002 

184.7 

7,830.1 


2003 

174.9 

8,162.5 


2004 

174.3 

8,681.6 


2005 

34.8 

9,036.1 


Appendix 8A2 


Likelihood Ratio (LR) Test 

The LR test is based on the maximum likelihood (ML) principle discussed in Appendix 4A, where 
we showed how one obtains the ML estimators of the two-variable regression model. The principle 
can be straightforwardly extended to the multiple regression model. Under the assumption that 
the disturbances u, are normally distributed, we showed that, for the two-variable regression model, 
the OLS and ML estimators of the regression coefficients are identical, but the estimated error 


'Optional. 





Chapter 8 Multiple Regression Analysis: The Problem of Inference 275 


variances are different. The OLS estimator of a 2 is « 2 /(» — 2) but the ML estimator is u 2 /n, 
the former being unbiased and the latter biased, although in large samples the bias tends to disappear. 

The same is true in the multiple regression case. To illustrate, consider the three-variable regres¬ 
sion model: 


Y t = /h + f 2 X 2i + foX 3i + Ui (1) 

Corresponding to Eq. (5) of Appendix 4A, the log-likelihood function for the model (1) can be 
written as: 


InLF = - n - InO 2 ) - ^ ln(2^) - ~ £(7 ; - ft - p 2 X 2i - f) 3 X 3i ) 2 (2) 

As shown in Appendix 4A, differentiating this function with respect to /Si, fii, f} 3 , and a 2 , setting the 
resulting expressions to zero, and solving, we obtain the ML estimators of these estimators. The ML 
estimators of /Si, p 2 , and fi 3 will be identical to OLS estimators, which are already given in 
Eqs. (7.4.6) to (7.4.8), but the error variance will be different in that the residual sum of squares (RSS) 
will be divided by n rather than by (n — 3), as in the case of OLS. 

Now let us suppose that our null hypothesis H 0 is that fi 3 , the coefficient of X 3 , is zero. In this 
case, log LF given in (2) will become 

In LF = - £ In {a 2 ) - | In (2 it) « ~ £(?,• - ft - P 2 X 2i ) 2 (3) 

Equation (3) is known as the restricted log-likelihood function (RLLF) because it is estimated with 
the restriction that a priori ft 3 is zero, whereas Eq. (1) is known as the unrestricted log LF (ULLF) 
because a priori there are no restrictions put on the parameters. To test the validity of the a priori re¬ 
striction that y03 is zero, the LR test obtains the following test statistic: 

X = 2(ULLF - RLLF) (4)* 

where ULLF and RLLF are, respectively, the unrestricted log-likelihood function (Eq. [2]) and the 
restricted log-likelihood function (Eq. [3]). If the sample size is large, it can be shown that the test 
statistic X given in Eq. (4) follows the chi-square (x 2 ) distribution with df equal to the number of 
restrictions imposed by the null hypothesis, 1 in the present case. 

The basic idea behind the LR test is simple: If the a priori restriction(s) is valid, the restricted and 
unrestricted (log) LF should not be different, in which case X in Eq. (4) will be zero. But if that is not 
the case, the two LFs will diverge. And since in a large sample we know that X follows the chi-square 
distribution, we can find out if the divergence is statistically significant, say, at a 1 or 5 percent level 
of significance. Or else, we can find out the p value of the estimated X. 

Let us illustrate the LR test with our child mortality example. If we regress child mortality (CM) 
on per capita GNP (PGNP) and female literacy rate (FLR) as we did in Eq. (8.1.4), we obtain ULLF 
of —328.1012, but if we regress CM on PGNP only, we obtain the RLLF of —361.6396. In absolute 
value (i.e., disregarding the sign), the former is smaller than the latter, which makes sense since we 
have an additional variable in the former model. 

The question now is whether it is worth adding the FLR variable. If it is not, the restricted and un¬ 
restricted LLF should not differ much, but if it is, the LLFs will be different. To see if this difference 
is statistically significant, we now use the LR test given in Eq. (4), which gives: 

X = 2[—328.1012 - (-361.6396)] = 67.0768 


*This expression can also be expressed as —2(RLLF — ULLF) 


-2 In (RLF/ULF). 
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Asymptotically, this is distributed as the chi-square distribution with 1 df (because we have only one 
restriction imposed when we omitted the FLR variable from the full model). The p value of obtaining 
such a chi-square value for 1 df is almost zero, leading to the conclusion that the FLR variable should 
not be excluded from the model. In other words, the restricted regression in the present instance is not 

Letting RRSS and URSS denote the restricted and unrestricted residual sums of squares, Eq. (4) 
can also be expressed as: 


—2 In k = n(lnRRSS - In URSS) (5) 


which is distributed as x 2 with r degrees of freedom, where r is the number of restrictions imposed 
on the model (i.e., the number of r coefficients omitted from the original model). 

Although we will not go into the details of the Wald and LM tests, these tests can be implemented 
as follows: 


Wald Statistic (W) = 


(« - fc)(RRSS - URSS) ^ 2 


Lagrange Multiplier Statistic (LM) = 


(n - k + r)(RRSS - URSS) _ 2 


Where k is the number of regressors in the unrestricted model and r is the number of restrictions. 

As you can see from the preceding equations, all three tests are asymptotically (i.e., in large sam¬ 
ples) equivalent, that is, they give similar answers. Flowever, in small samples the answers can differ. 
There is an interesting relationship among these statistics in that it can be shown that: 


W > LR 


LM 


Therefore, in small samples, a hypothesis can be rejected by the Wald statistic but not rejected by the 
LM statistic.* 

As noted in the text, for most of our purposes the t and F tests will suffice. But the three tests dis¬ 
cussed above are of general applicability in that they can be applied to testing nonlinear hypotheses 
in linear models, or testing restrictions on variance-covariance matrices. They also can be applied in 
situations where the assumption that the errors are normally distributed is not tenable. 

Because of the mathematical complexity of the Wald and LM tests, we will not go into more de¬ 
tail here. But as noted, asymptotically, the LR, Wald, and LM tests give identical answers, the choice 
of the test depending on computational convenience. 


'For an explanation, see C. S. Maddala, Introduction to Econometrics, 3d ed., John Wiley Sc Sons, New 
York, 2001, p. 1 77. 





Chapter 


Dummy Variable 
Regression Models 

In Chapter 1 we discussed briefly the four types of variables that one generally encounters 
in empirical analysis: These are: ratio scale, interval scale, ordinal scale, and nominal 
scale. The types of variables that we have encountered in the preceding chapters were 
essentially ratio scale. But this should not give the impression that regression models can 
deal only with ratio scale variables. Regression models can also handle other types of 
variables mentioned previously. In this chapter, we consider models that may involve 
not only ratio scale variables but also nominal scale variables. Such variables are also 
known as indicator variables, categorical variables, qualitative variables, or dummy 
variables . 1 


9.1 The Nature of Dummy Variables 

In regression analysis the dependent variable, or regressand, is frequently influenced not 
only by ratio scale variables (e.g., income, output, prices, costs, height, temperature) but 
also by variables that are essentially qualitative, or nominal scale, in nature, such as sex, 
race, color, religion, nationality, geographical region, political upheavals, and party affilia¬ 
tion. For example, holding all other factors constant, female workers are found to earn less 
than their male counterparts or nonwhite workers are found to earn less than whites. 2 This 
pattern may result from sex or racial discrimination, but whatever the reason, qualitative 
variables such as sex and race seem to influence the regressand and clearly should be 
included among the explanatory variables, or the regressors. 

Since such variables usually indicate the presence or absence of a “quality” or an 
attribute, such as male or female, black or white, Catholic or non-Catholic, Democrat or 
Republican, they are essentially nominal scale variables. One way we could “quantify” 
such attributes is by constructing artificial variables that take on values of 1 or 0, 1 indicat¬ 
ing the presence (or possession) of that attribute and 0 indicating the absence of that 
attribute. For example, 1 may indicate that a person is a female and 0 may designate a male; 
or 1 may indicate that a person is a college graduate, and 0 that the person is not, and so on. 


'We will discuss ordinal scale variables in Chapter 15. 

2 For a review of the evidence on this subject, see Bruce E. Kaufman and Julie L. Hotchkiss, The 
Economics of Labor Markets, 5th ed., Dryden Press, New York, 2000. 
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Variables that assume such 0 and 1 values are called dummy variables. * * 3 Such variables 
are thus essentially a device to classify data into mutually exclusive categories such as 
male or female. 

Dummy variables can be incorporated in regression models just as easily as quantitative 
variables. As a matter of fact, a regression model may contain regressors that are all exclu¬ 
sively dummy, or qualitative, in nature. Such models are called Analysis of Variance 
(ANOVA) models. 4 

9.2 ANOVA Models 


To illustrate the ANOVA models, consider the following example. 


EXAMPLE 9.1 

Public School 
Teachers’ 
Salaries by 
Geographical 
Region 


Table 9.1 gives data on average salary (in dollars) of public school teachers in 50 states and 
the District of Columbia for the academic year 2005-2006. These 51 areas are classified 
into three geographical regions: (1) Northeast and North Central (21 states in all), 
(2) South (17 states in all), and (3) West (13 states in all). For the time being, do not worry 
about the format of the table and the other data given in the table. 

Suppose we want to find out if the average annual salary of public school teachers differs 
among the three geographical regions of the country. If you take the simple arith¬ 
metic average of the average salaries of the teachers in the three regions, you will find that 
these averages for the three regions are as follows: $49,538.71 (Northeast and North 
Central), $46,293.59 (South), and $48,104.62 (West). These numbers look different, but 
are they statistically different from one another? There are various statistical techniques to 
compare two or more mean values, which generally go by the name of analysis of 
variance. 5 But the same objective can be accomplished within the framework of regres¬ 
sion analysis. 

To see this, consider the following model: 

Y, = fa + faD 7i + fcPy +ut ( 9 . 2 . 1 ) 

where Y, = (average) salary of public school teacher in state / 

D 2 , = 1 if the state is in the Northeast or North Central 
= 0 otherwise (i.e., in other regions of the country) 

Da = 1 if the state is in the South 

= 0 otherwise (i.e., in other regions of the country) 

Note that Eq. (9.2.1) is like any multiple regression model considered previously, except 
that, instead of quantitative regressors, we have only qualitative, or dummy, regressors. 


3 lt is not absolutely essential that dummy variables take the values of 0 and 1. The pair (0,1) can be 
transformed into any other pair by a linear function such that Z = a + bD (6 / 0), where a and b are 
constants and where D = 1 or 0. When D = 1, we have Z = a+ b, and when D = 0, we have Z = a. 
Thus the pair (0,1) becomes (a, a + b). For example, if a = 1 and 6=2, the dummy variables will be 
(1, 3). This expression shows that qualitative, or dummy, variables do not have a natural scale of measure¬ 
ment. That is why they are described as nominal scale variables. 

4 ANOVA models are used to assess the statistical significance of the relationship between a quantita¬ 

tive regressand and qualitative or dummy regressors. They are often used to compare the differences 
in the mean values of two or more groups or categories, and are therefore more general than the t 
test, which can be used to compare the means of two groups or categories only. 
s For an applied treatment, see John Fox, Applied Regression Analysis, Linear Models, and Related 
Methods, Sage Publications, 1997, Chapter 8. 
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TABLE 9.1 Average Salary of Public School Teachers by State, 2005-2006 



Salary 

Spending 

02 

03 


Salary 

Spending 

d 2 

Ds 

Connecticut 

60,822 

12,436 

1 

0 

Georgia 

49,905 

8,534 

0 

1 

Illinois 

58,246 

9,275 

1 

0 

Kentucky 

43,646 

8,300 

0 

1 

Indiana 

47,831 

8,935 

1 

0 

Louisiana 

42,816 

8,519 

0 

1 

Iowa 

43,130 

7,807 

1 

0 

Maryland 

56,927 

9,771 

0 

1 

Kansas 

43,334 

8,373 

1 

0 

Mississippi 

40,182 

7,215 

0 

1 

Maine 

41,596 

11,285 

1 

0 

North Carolina 

46,410 

7,675 

0 

1 

Massachusetts 

58,624 

12,596 

1 

0 

Oklahoma 

42,379 

6,944 

0 

1 

Michigan 

54,895 

9,880 

1 

0 

South Carolina 

44,1 33 

8,377 

0 

1 

Minnesota 

49,634 

9,675 

1 

0 

Tennessee 

43,816 

6,979 

0 

1 

Missouri 

41,839 

7,840 

1 

0 

Texas 

44,897 

7,547 

0 

1 

Nebraska 

42,044 

7,900 

1 

0 

Virginia 

44,727 

9,275 

0 

1 

New Hampshire 

46,527 

10,206 

1 

0 

West Virginia 

40,531 

9,886 

0 

1 

New Jersey 

59,920 

13,781 

1 

0 

Alaska 

54,658 

10,171 

0 

0 

New York 

58,537 

1 3,551 

1 

0 

Arizona 

45,941 

5,585 

0 

0 

North Dakota 

38,822 

7,807 

1 

0 

California 

63,640 

8,486 

0 

0 

Ohio 

51,937 

10,034 

1 

0 

Colorado 

45,833 

8,861 

0 

0 

Pennsylvania 

54,970 

10,711 

1 

0 

Hawaii 

51,922 

9,879 

0 

0 

Rhode Island 

55,956 

11,089 

1 

0 

Idaho 

42,798 

7,042 

0 

0 

South Dakota 

35,378 

7,911 

1 

0 

Montana 

41,225 

8,361 

0 

0 

Vermont 

48,370 

12,475 

1 

0 

Nevada 

45,342 

6,755 

0 

0 

Wisconsin 

47,901 

9,965 

1 

0 

New Mexico 

42,780 

8,622 

0 

0 

Alabama 

43,389 

7,706 

0 

1 

Oregon 

50,911 

8,649 

0 

0 

Arkansas 

44,245 

8,402 

0 

1 

Utah 

40,566 

5,347 

0 

0 

Delaware 

54,680 

12,036 

0 

1 

Washington, D.C. 

47,882 

7,958 

0 

0 

District of 

59,000 

15,508 

0 

1 

Wyoming 

50,692 

11,596 

0 

0 

Columbia 










Florida 

45,308 

7,762 

0 

1 







es in the Northeast and North Central; 0 


taking the value of 1 if the observation belongs to a particular category and 0 if it does not 
belong to that category or group. Hereafter, we shall designate all dummy variables by the 
letter 0. Table 9.1 shows the dummy variables thus constructed. 

What does the model (9.2.1) tell us? Assuming that the error term satisfies the usual 
OLS assumptions, on taking expectation of Eq. (9.2.1) on both sides, we obtain: 

Mean salary of public school teachers in the Northeast and North Central: 

E(Yi | 02/ = 1 , 0 3 , = 0) = ft + fi 2 (9.2.2) 

Mean salary of public school teachers in the South: 

E(Yi | 02/ = 0, 0 3/ = 1 ) = ^ + ft (9.2.3) 

You might wonder how we find out the mean salary of teachers in the West. If you 
guessed that this is equal to fii, you would be absolutely right, for 
Mean salary of public school teachers in the West: 


E(Yj\ 0 2/ = 0, 0 3 / = 0) = /Si 


(9.2.4) 

( Continued ) 
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EXAMPLE 9.1 

(' Continued) 


FIGURE 9.1 

Average salary 
(in dollars) of public 
school teachers in 
three regions. 


In other words, the mean salary of public school teachers in the West is given by the 
intercept, fa, in the multiple regression (9.2.1), and the "slope" coefficients fa and fa tell 
by how much the mean salaries of teachers in the Northeast and North Central and in the 
South differ from the mean salary of teachers in the West. But how do we know if 
these differences are statistically significant? Before we answer this question, let us present 
the results based on the regression (9.2.1). Using the data given in Table 9.1, we obtain the 
following results: 


Yj = 48,014.615 + 1,524.099D 2 ; — 1,721.027D 3/ 

se = (1857.204) (2363.139) (2467.151) 

(25.853) (0.645) (-0.698) 

(0.0000)* (0.5220)* (0.4888)* 


(9.2.5) 


R 2 = 0.0440 


where * indicates the p values. 

As these regression results show, the mean salary of teachers in the West is about 
$48,015, that of teachers in the Northeast and North Central is higher by about $1,524, 
and that of teachers in the South is lower by about $1,721. The actual mean salaries in the 
last two regions can be easily obtained by adding these differential salaries to the mean 
salary of teachers in the West, as shown in Eqs. (9.2.3) and (9.2.4). Doing this, we will find 
that the mean salaries in the latter two regions are about $49,539 and $46,294. 

But how do we know that these mean salaries are statistically different from the mean 
salary of teachers in the West, the comparison category? That is easy enough. All we have 
to do is to find out if each of the "slope" coefficients in Eq. (9.2.5) is statistically significant. 
As can be seen from this regression, the estimated slope coefficient for Northeast and 
North Central is not statistically significant, as its p value is 52 percent, and that of the 
South is also not statistically significant, as the p value is about 49 percent. Therefore, the 
overall conclusion is that statistically the mean salaries of public school teachers in the West, 
the Northeast and North Central, and the South are about the same. Diagrammatically, the 
situation is shown in Figure 9.1. 

A caution is in order in interpreting these differences. The dummy variables will simply 
point out the differences, if they exist, but they do not suggest the reasons for the differ¬ 
ences. Differences in educational levels, cost of living indexes, gender, and race may all 
have some effect on the observed differences. Therefore, unless we take into account all 
the other variables that may affect a teacher's salary, we will not be able to pin down the 
cause(s) of the differences. 

From the preceding discussion, it is clear that all one has to do is see if the coefficients 
attached to the various dummy variables are individually statistically significant. This example 
also shows how easy it is to incorporate qualitative, or dummy, regressors in the regression 
models. 


JSj = $49,539 


$48,015 (fa + fa) 


$46,294 (fa + fa) 


Northeast and 
North Central 


West 


South 
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Caution in the Use of Dummy Variables 

Although they are easy to incorporate in the regression models, one must use the dummy 
variables carefully. In particular, consider the following aspects: 

1. In Example 9.1, to distinguish the three regions, we used only two dummy variables, 
Z>2 and £>3. Why did we not use three dummies to distinguish the three regions? Suppose we 
do that and write the model (9.2.1) as: 

Yi = a + Pr D u + p 2 D 2i + ft D 3i + «, (9.2.6) 

where D\ l takes a value of 1 for states in the West and 0 otherwise. Thus, we now have a 
dummy variable for each of the three geographical regions. Using the data in Table 9.1, if 
you were to run the regression (9.2.6), the computer would “refuse” to run the regression 
(try it). 6 Why? The reason is that in the setup of Eq. (9.2.6) where you have a dummy variable 
for each category or group and also an intercept, you have a case of perfect collinearity, that 
is, exact linear relationships among the variables. Why? Refer to Table 9.1. Imagine that now 
we add the D\ column, taking the value of 1 whenever a state is in the West and 0 otherwise. 
Now if you add the three D columns horizontally, you will obtain a column that has 51 ones 
in it. But since the value of the intercept a is (implicitly) 1 for each observation, you will 
have a column that also contains 51 ones. In other words, the sum of the three D columns will 
simply reproduce the intercept column, thus leading to perfect collinearity. In this case, 
estimation of the model (9.2.6) is impossible. 

The message here is: If a qualitative variable has m categories, introduce only (m — 1) 
dummy variables. In our example, since the qualitative variable “region” has three cate¬ 
gories, we introduced only two dummies. If you do not follow this rule, you will fall into 
what is called the dummy variable trap, that is, the situation of perfect collinearity or per¬ 
fect multicollinearity, if there is more than one exact relationship among the variables. This 
rule also applies if we have more than one qualitative variable in the model, an example of 
which is presented later. Thus we should restate the preceding rule as: For each qualitative 
regressor, the number of dummy variables introduced must be one less than the 
categories of that variable. Thus, if in Example 9.1 we had information about the gender 
of the teacher, we would use an additional dummy variable (but not two) taking a value of 
1 for female and 0 for male or vice versa. 

2. The category for which no dummy variable is assigned is known as the base, 
benchmark, control, comparison, reference, or omitted category. And all comparisons 
are made in relation to the benchmark category. 

3. The intercept value (ft) represents the mean value of the benchmark category. In 
Example 9.1, the benchmark category is the Western region. Hence, in the regression 
(9.2.5) the intercept value of about 48,015 represents the mean salary of teachers in the 
Western states. 

4. The coefficients attached to the dummy variables in Eq. (9.2.1) are known as the 
differential intercept coefficients because they tell by how much the value of the category 
that receives the value of 1 differs from the intercept coefficient of the benchmark category. 
For example, in Eq. (9.2.5), the value of about 1,524 tells us that the mean salary of teachers 
in the Northeast or North Central is larger by about $1,524 than the mean salary of about 
$48,015 for the benchmark category, the West. 


6 Actually you will get a message saying that the data matrix is singular. 
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5. If a qualitative variable has more than one category, as in our illustrative example, the 
choice of the benchmark category is strictly up to the researcher. Sometimes the choice of 
the benchmark is dictated by the particular problem at hand. In our illustrative example, we 
could have chosen the South as the benchmark category. In that case the regression results 
given in Eq. (9.2.5) will change, because now all comparisons are made in relation to the 
South. Of course, this will not change the overall conclusion of our example (why?). In this 
case, the intercept value will be about $46,294, which is the mean salary of teachers in the 
South. 

6. We warned above about the dummy variable trap. There is a way to circumvent this 
trap by introducing as many dummy variables as the number of categories of that variable, 
provided we do not introduce the intercept in such a model. Thus, if we drop the intercept 
term from Eq. (9.2.6), and consider the following model, 

Yt = aa, + AA, + AA; + Ui (9.2.7) 

we do not fall into the dummy variable trap, as there is no longer perfect collinearity. But 
make sure that when you run this regression, you use the no-intercept option in your 
regression package. 

How do we interpret regression (9.2.7)? If you take the expectation of Eq. (9.2.7), you 
will find that: 

pi = mean salary of teachers in the West 

p 2 = mean salary of teachers in the Northeast and North Central 

Pi = mean salary of teachers in the South 

In other words, with the intercept suppressed, and allowing a dummy variable for each cat¬ 
egory, we obtain directly the mean values of the various categories. The results of Eq. (9.2.7) 
for our illustrative example are as follows: 

% = 48,014.62a, + 49,538.71A; + 46,293.59A; 
se= (1857.204) (1461.240) (1624.077) (9.2.8) 

t- (25.853)* (33.902)* (28.505)* 

R 2 = 0.044 

where * indicates that the p values of these t ratios are very small. 

As you can see, the dummy coefficients give directly the mean (salary) values in the 
three regions? West, Northeast and North Central, and South. 

7. Which is a better method of introducing a dummy variable: (1) introduce a dummy 
for each category and omit the intercept term or (2) include the intercept term and introduce 
only (m — 1) dummies, where m is the number of categories of the dummy variable? As 
Kennedy notes: 

Most researchers find the equation with an intercept more convenient because it allows them 
to address more easily the questions in which they usually have the most interest, namely, 
whether or not the categorization makes a difference, and if so, by how much. If the catego¬ 
rization does make a difference, by how much is measured directly by the dummy variable 
coefficient estimates. Testing whether or not the categorization is relevant can be done by 
running a l test of a dummy variable coefficient against zero (or, to be more general, an F test 
on the appropriate set of dummy variable coefficient estimates ). 7 


7 Peter Kennedy, A Guide to Econometrics, 4th ed., MIT Press, Cambridge, Mass., 1998, p. 223. 
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9.3 ANOVA Models with Two Qualitative Variables 

In the previous section we considered an ANOVA model with one qualitative variable with 
three categories. In this section we consider another ANOVA model, but with two qualita¬ 
tive variables, and bring out some additional points about dummy variables. 


EXAMPLE 9.2 

Hourly Wages in 
Relation to 
Marital Status 
and Region of 
Residence 


From a sample of 528 persons in May 1985, the following regression results were 
obtained. * * * 8 

% = 8.8148 + 1,0997D 2 /- 1.6729D 3 , 

se = (0.4015) (0.4642) (0.4854) 

t= (21.9528) (2.3688) (-3.4462) (9.3.1) 

(0.0000)* (0.0182)* (0.0006)* 

R 2 = 0.0322 


where / = hourly wage ($) 

0 2 = married status; 1 = married, 0 = otherwise 
D 3 = region of residence; 1 = South, 0 = otherwise 

and * denotes the p values. 

In this example we have two qualitative regressors, each with two categories. Hence 
we have assigned a single dummy variable for each category. 

Which is the benchmark category here? Obviously, it is unmarried, non-South resi¬ 
dence. In other words, unmarried persons who do not live in the South are the omitted 
category. Therefore, all comparisons are made in relation to this group. The mean hourly 
wage in this benchmark is about $8.81. Compared with this, the average hourly wage of 
those who are married is higher by about $1.10, for an actual average wage of $9.91 
( = 8.81 + 1.10). By contrast, for those who live in the South, the average hourly wage is 
lower by about $1.67, for an actual average hourly wage of $7.14. 

Are the preceding average hourly wages statistically different compared to the base 
category? They are, for all the differential intercepts are statistically significant, as their p 
values are quite low. 

The point to note about this example is this: Once you go beyond one qualitative 
variable, you have to pay close attention to the category that is treated as the base category, 
since all comparisons are made in relation to that category. This is especially important when 
you have several qualitative regressors, each with several categories. But the mechanics of 
introducing several qualitative variables should be clear by now. 


9.4 Regression with a Mixture of Quantitative and Qualitative 
Regressors: The ANCOVA Models 

ANOVA models of the type discussed in the preceding two sections, although common 

in fields such as sociology, psychology, education, and market research, are not that com¬ 

mon in economics. Typically, in most economic research a regression model contains 


8 The data are obtained from the data disk in Arthur S. Goldberger, Introductory Econometrics, Harvard 
University Press, Cambridge, Mass., 1998. We have already considered these data in Chapter 2. 
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some explanatory variables that are quantitative and some that are qualitative. Regression 
models containing an admixture of quantitative and qualitative variables are called 
analysis of covariance (ANCOVA) models. ANCOVA models are an extension of the 
ANOVA models in that they provide a method of statistically controlling the effects of 
quantitative regressors, called covariates or control variables, in a model that includes 
both quantitative and qualitative, or dummy, regressors. We now illustrate the ANCOVA 
models. 


EXAMPLE 9.3 

Teachers’ Salary 
in Relation to 
Region and 
Spending on 
Public School 
per Pupil 


To motivate the analysis, let us reconsider Example 9.1 by maintaining that the average 
salary of public school teachers may not be different in the three regions if we take into 
account any variables that cannot be standardized across the regions. Consider, for 
example, the variable expenditure on public schools by local authorities, as public education 
is primarily a local and state question. To see if this is the case, we develop the following 
model: 


Yj = Pi+p 2 Dr, + Pi Du +p 4 Xi + ui (9.4.1) 


where Y-, = average annual salary of public school teachers in state ($) 

X, = spending on public school per pupil ($) 

D 2 , * 1, if the state is in the Northeast or North Central 
= 0, otherwise 

Dy = 1, if the state is in the South 
= 0, otherwise 

The data on X are given in Table 9.1. Keep in mind that we are treating the West as the 
benchmark category. Also, note that besides the two qualitative regressors, we have a 
quantitative variable, X, which in the context of the ANCOVA models is known as a 
covariate, as noted earlier. 

From the data in Table 9.1, the results of the model (9.4.1) are as follows: 

?i = 28,694.918 - 2,954.12 7D 2i - 3,112.194D 3 , + 2.3404X, 

se = (3262.521) (1862.576) (1819.873) (0.3592) 

f= (8.795)* (-1.586)** (-1.710)** (6.515)* ( 9 - 4 - 2 ) 

R 2 = 0.4977 

where * indicates p values less than 5 percent, and ** indicates p values greater than 
5 percent. 

As these results suggest, ceteris paribus: as public expenditure goes up by a dollar, on 
average, a public school teacher's salary goes up by about $2.34. Controlling for spend¬ 
ing on education, we now see that the differential intercept coefficient is not significant 
for either the Northeast and North Central region or for the South. These results are 
different from those of Eq. (9.2.5). But this should not be surprising, for in Eq. (9.2.5) we 
did not account for the covariate, differences in per pupil public spending on education. 
Diagrammatically, we have the situation shown in Figure 9.2. 

Note that although we have shown three regression lines for the three regions, statis¬ 
tically the regression lines are the same for all three regions. Also note that the three 
regression lines are drawn parallel. (Why?) 
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FIGURE 9.2 

Public school 
teacher’s salary (7) 
in relation to per 
pupil expenditure on 
education ( X ). 



9.5 The Dummy Variable Alternative to the Chow Test * 1 2 3 4 * * * * 9 

In Section 8.7 we discussed the Chow test to examine the structural stability of a regression 
model. The example we discussed there related to the relationship between savings and in¬ 
come in the United States over the period 1970-1995. We divided the sample period into 
two, 1970-1981 and 1982-1995, and showed on the basis of the Chow test that there was a 
difference in the regression of savings on income between the two periods. 

However, we could not tell whether the difference in the two regressions was because of 
differences in the intercept terms or the slope coefficients or both. Very often this knowl¬ 
edge itself is very useful. 

Referring to Eqs. (8.7.1) and (8.7.2), we see that there are four possibilities, which we 
illustrate in Figure 9.3. 

1. Both the intercept and the slope coefficients are the same in the two regressions. This, the 
case of coincident regressions, is shown in Figure 9.3a. 

2. Only the intercepts in the two regressions are different but the slopes are the same. This 
is the case of parallel regressions, which is shown in Figure 9.3 b. 

3. The intercepts in the two regressions are the same, but the slopes are different. This is 
the situation of concurrent regressions (Figure 9.3c). 

4. Both the intercepts and slopes in the two regressions are different. This is the case of dis¬ 
similar regressions, which is shown in Figure 9.3 d. 

The multistep Chow test procedure discussed in Section 8.7, as noted earlier, tells us only 

if two (or more) regressions are different without telling us what the source of the difference is. 


9 The material in this section draws on the author's articles, "Use of Dummy Variables in Testing for 

Equality between Sets of Coefficients in Two Linear Regressions: A Note," and "Use of Dummy 

Variables ... A Generalization," both published in the American Statistician, vol. 24, nos. 1 and 5, 
1970, pp. 50-52 and 18-21. 
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FIGURE 9.3 

Plausible 

savings-income 

regressions. 


Savings 



(a) Coincident regressions 


Savings 



( b ) Parallel regressions 



The source of difference, if any, can be pinned down by pooling all the observations (26 in 
all) and running just one multiple regression as shown below: 10 

Y t = «i + a 2 D, + PiX, + P 2 (D t X t ) + u t (9.5.1) 

where Y = savings 
X — income 
t = time 

D = 1 for observations in 1982-1995 

= 0, otherwise (i.e., for observations in 1970-1981) 

Table 9.2 shows the structure of the data matrix. 

To see the implications of Eq. (9.5.1), and, assuming, as usual, that E(u t ) = 0, we 
obtain: 

Mean savings function for 1970-1981: 

E(Y t | D t = 0, X t ) = a\ + piX, (9.5.2) 

Mean savings function for 1982-1995: 

E(Y t | D t = 1, X t ) = («! + a 2 ) + (Pi + Pi)X t (9.5.3) 

The reader will notice that these are the same functions as Eqs. (8.7.1) and (8.7.2), with 
Ai = ai, A.2 = Pi, yt = (ai + a 2 ), and y 2 = (Pi + p 2 ). Therefore, estimating Eq. (9.5.1) is 
equivalent to estimating the two individual savings functions in Eqs. (8.7.1) and (8.7.2). 

10 As in the Chow test, the pooling technique assumes homoscedasticity, that is, erf = <r| = cr 2 . 
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TABLE 9.2 

Savings and Income 
Data, United States, 
1970-1995 

Source: Economic Report of the 
President, 1997, Table B-28, 
p. 332. 


EXAMPLE 9.4 

Structural 
Differences in 
the U.S. Savings- 
Income 
Regression, 
the Dummy 
Variable 
Approach 


Observation 

Savings 

Income 

Dum 

1970 

61 

727.1 

0 

1971 

68.6 

790.2 

0 

1972 

63.6 

855.3 

0 

1973 

89.6 

965 

0 

1974 

97.6 

1054.2 

0 

1975 

104.4 

1159.2 

0 

1976 

96.4 

1273 

0 

1977 

92.5 

1401.4 

0 

1978 

112.6 

1580.1 

0 

1979 

130.1 

1 769.5 

0 

1980 

161.8 

1973.3 

0 

1981 

199.1 

2200.2 

0 

1982 

205.5 

2347.3 

1 

1983 

167 

2522.4 

1 

1984 

235.7 

2810 

1 

1985 

206.2 

3002 

1 

1986 

196.5 

3187.6 

1 

1987 

168.4 

3363.1 

1 

1988 

189.1 

3640.8 

1 

1989 

187.8 

3894.5 

1 

1990 

208.7 

4166.8 

1 

1991 

246.4 

4343.7 

1 

1992 

272.6 

4613.7 

1 

1993 

214.4 

4790.2 

1 

1994 

189.4 

5021.7 

1 

1995 

249.3 

5320.8 

1 


Savings and income figures are in billions of dollars. 


In Eq. (9.5.1), a?2 is the differential intercept, as previously, and fa is the differential 
slope coefficient (also called the slope drifter), indicating by how much the slope coeffi¬ 
cient of the second period’s savings function (the category that receives the dummy value 
of 1) differs from that of the first period. Notice how the introduction of the dummy 
variable D in the interactive, or multiplicative, form (D multiplied by X) enables us to dif¬ 
ferentiate between slope coefficients of the two periods, just as the introduction of the 
dummy variable in the additive form enabled us to distinguish between the intercepts of 
the two periods. 


Before we proceed further, let us first present the regression results of model (9.5.1) 
applied to the U.S. savings-income data. 

Y t = 1.0161 + 152.4786D t + 0.0803X t - 0.0655(D t X t ) 

se = (20.1648) (33.0824) (0.0144) (0.0159) (9.5.4) 

t= (0.0504)** (4.6090)* (5.5413)* (-4.0963)* 

R 2 = 0.8819 

where * indicates p values less than 5 percent and ** indicates p values greater than 
5 percent. 

( Continued ) 
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EXAMPLE 9.4 As these regression results show, both the differential intercept and slope coefficients 

(, Continued ) are statistical| y significant, strongly suggesting that the savings-income regressions for the 

two time periods are different, as in Figure 9.3d. 

From Eq. (9.5.4), we can derive equations (9.5.2) and (9.5.3), which are: 
Savings-income regression, 1970-1981: 

Y t = 1.0161 + 0.0803Xt (9.5.5) 

Savings-income regression, 1982-1995: 

ft = (1.0161 + 152.4786) + (0.0803 - 0.0655)X t 
= 153.4947+ 0.0148X, (9.5.6) 

These are precisely the results we obtained in Eqs. (8.7.1 a) and (8.7.2a), which should not 
be surprising. These regressions are already shown in Figure 8.3. 

The advantages of the dummy variable technique (i.e., estimating Eq. [9.5.1 ]) over the 
Chow test (i.e., estimating the three regressions [8.7.1 ], [8.7.2], and [8.7.3]) can now be 
seen readily: 

1. We need to run only a single regression because the individual regressions can easily be 
derived from it in the manner indicated by equations (9.5.2) and (9.5.3). 

2. The single regression (9.5.1) can be used to test a variety of hypotheses. Thus if the 
differential intercept coefficient «2 is statistically insignificant, we may accept the 
hypothesis that the two regressions have the same intercept, that is, the two 
regressions are concurrent (see Figure 9.3c). Similarly, if the differential slope coefficient 
/fe is statistically insignificant but 012 is significant, we may not reject the hypothesis that 
the two regressions have the same slope, that is, the two regression lines are parallel 
(cf. Figure 9.3b). The test of the stability of the entire regression (i.e., a 2 = fc = 0, 
simultaneously) can be made by the usual F test (recall the restricted least-squares F 
test). If this hypothesis is not rejected, the regression lines will be coincident, as shown 
in Figure 9.3a. 

3. The Chow test does not explicitly tell us which coefficient, intercept, or slope is 
different, or whether (as in this example) both are different in the two periods. That is, 
one can obtain a significant Chow test because the slope only is different or the 
intercept only is different, or both are different. In other words, we cannot tell, via the 
Chow test, which one of the four possibilities depicted in Figure 9.3 exists in a given 
instance. In this respect, the dummy variable approach has a distinct advantage, for it 
not only tells if the two are different but also pinpoints the source(s) of the difference— 
whether it is due to the intercept or the slope or both. In practice, the knowledge that 
two regressions differ in this or that coefficient is as important as, if not more than, the 
plain knowledge that they are different. 

4. Finally, since pooling (i.e., including all the observations in one regression) increases the 
degrees of freedom, it may improve the relative precision of the estimated parameters. 
Of course, keep in mind that every addition of a dummy variable will consume one degree 
of freedom. 


9.6 Interaction Effects Using Dummy Variables 

Dummy variables are a flexible tool that can handle a variety of interesting problems. To see 
this, consider the following model: 

f t = <XJ 


+ 0.2^21 + 0(3 Dt >i + pXj + Uj 


(9.6.1) 
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where Y = hourly wage in dollars 

X = education (years of schooling) 

D 2 = 1 if female, 0 otherwise 

Z) 3 = 1 if nonwhite and non-Hispanic, 0 otherwise 

In this model gender and race are qualitative regressors and education is a quantitative 
regressor. 11 Implicit in this model is the assumption that the differential effect of the gen¬ 
der dummy D 2 is constant across the two categories of race and the differential effect of the 
race dummy D 3 is also constant across the two sexes. That is to say, if the mean salary is 
higher for males than for females, this is so whether they are nonwhite/non-Hispanic or not. 
Likewise, if, say, nonwhite/non-Hispanics have lower mean wages, this is so whether they 
are females or males. 

In many applications such an assumption may be untenable. A female nonwhite/ 
non-Hispanic may earn lower wages than a male nonwhite/non-Hispanic. In other words, 
there may be interaction between the two qualitative variables D 2 and Z) 3 . Therefore their 
effect on mean Y may not be simply additive as in Eq. (9.6.1) but multiplicative as well, as in 
the following model. 

Yi = a i + ot 2 D 2 j + a 3 D 3i + 04 ( 02 , D 3| ) + fiX l + m, (9.6.2) 

where the variables are as defined for model (9.6.1). 

From Eq. (9.6.2), we obtain: 

M(Yi | D 2i = 1, 0 3i = 1, Xi) = (01 + <*2 + « 3 + a 4 ) + PXi (9.6.3) 
which is the mean hourly wage function for female nonwhite/non-Hispanic workers. 
Observe that 

a 2 = differential effect of being a female 

a 3 = differential effect of being a nonwhite/non-Hispanic 

04 = differential effect of being a female nonwhite/non-Hispanic 
which shows that the mean hourly wages of female nonwhite/non-Hispanics is different 
(by 04) from the mean hourly wages of females or nonwhite/non-Hispanics. If, for instance, 
all the three differential dummy coefficients are negative, this would imply that female 
nonwhite/non-Hispanic workers earn much lower mean hourly wages than female or 
nonwhite/non-Hispanic workers as compared with the base category, which in the present 
example is male white or Hispanic. 

Now the reader can see how the interaction dummy (i.e., the product of two qualitative 
or dummy variables) modifies the effect of the two attributes considered individually (i.e., 
additively). 


EXAMPLE 9.5 

Average Hourly 
Earnings in 
Relation to 
Education, 

Gender, and 
Race 

"if we were to define education as less than high school, high school, and more than high school, 
we could then use two dummies to represent the three classes. 


Let us first present the regression results based on model (9.6.1). Using the data that 
were used to estimate regression (9.3.1), we obtained the following results: 

Pc -0.2610 - 2.3606D 2 , - 1.7327D 3 ,+ 0.8028X, 

t = (-0.2357)** (-5.4873)* (-2.1803)* (9.9094)* (9.6.4) 

R 2 = 0.2032 n = 528 

where * indicates p values less than 5 percent and ** indicates p values greater than 
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EXAMPLE 9.5 The reader can check that the differential intercept coefficients are statistically 

( Continued ) significant, that they have the expected signs (why?), and that education has a strong 

positive effect on hourly wage, an unsurprising finding. 

As Eq. (9.6.4) shows, ceteris paribus, the average hourly earnings of females are lower 
by about $2.36, and the average hourly earnings of nonwhite non-Hispanic workers are 
also lower by about $1.73. 

We now consider the results of model (9.6.2), which includes the interaction dummy. 

Y?= -0.26100 - 2.3606D 2 , - 1.7327D 3 , + 2.1289D 2/ D 3/ + 0.8028X, 

t= (-0.2357)" (-5.4873)* (-2.1803)* (1.7420)** (9.9095)** (9.6.5) 

R 2 = 0.2032 n = 528 

where * indicates p values less than 5 percent and ** indicates p values greater than 
5 percent. 

As you can see, the two additive dummies are still statistically significant, but the 
interactive dummy is not at the conventional 5 percent level; the actual p value of the 
interaction dummy is about the 8 percent level. If you think this is a low enough 
probability, then the results of Eq. (9.6.5) can be interpreted as follows: Holding the 
level of education constant, if you add the three dummy coefficients you will obtain: 
-1.964 (= -2.3605 - 1.7327 + 2.1289), which means that mean hourly wages of 
nonwhite/non-Hispanic female workers is lower by about $1.96, which is between the 
value of -2.3605 (gender difference alone) and -1.7327 (race difference alone). 


The preceding example clearly reveals the role of interaction dummies when two or 
more qualitative regressors are included in the model. It is important to note that in the 
model (9.6.5) we are assuming that the rate of increase of hourly earnings with respect to 
education (of about 80 cents per additional year of schooling) remains constant across 
gender and race. But this may not be the case. If you want to test for this, you will have to 
introduce differential slope coefficients (see Exercise 9.25). 


9.7 The Use of Dummy Variables in Seasonal Analysis 

Many economic time series based on monthly or quarterly data exhibit seasonal patterns 
(regular oscillatory movements). Examples are sales of department stores at Christmas and 
other major holiday times, demand for money (or cash balances) by households at holiday 
times, demand for ice cream and soft drinks during summer, prices of crops right after har¬ 
vesting season, demand for air travel, etc. Often it is desirable to remove the seasonal 
factor, or component, from a time series so that one can concentrate on the other compo¬ 
nents, such as the trend. 12 The process of removing the seasonal component from a time 
series is known as deseasonalization or seasonal adjustment, and the time series thus 
obtained is called the deseasonalized, or seasonally adjusted, time series. Important 
economic time series, such as the unemployment rate, the consumer price index (CPI), the 
producer’s price index (PPI), and the index of industrial production, are usually published 
in seasonally adjusted form. 


12 A time series may contain four components: (1) seasonal, (2) cyclical, (3) trend, and (4) strictly 
random. 
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TABLE 9.3 

Quarterly Data on 
Appliance Sales (in 
thousands) and 
Expenditure on 
Durable Goods 
(1978-1 to 1985-1V) 

Source: Business Statistics and 
Survey of Current Business, 
Department of Commerce 


DISH DISP 

841 798 
957 837 
999 821 
960 858 
894 837 
851 838 
863 832 
878 818 
792 868 
589 623 
657 662 
699 822 
675 871 
652 791 
628 759 
529 734 


FRIG WASH 

1317 1271 

1615 1295 

1662 1313 

1295 1150 

1271 1289 

1555 1245 

1639 1270 

1238 1103 

1277 1273 

1258 1031 

1417 1143 

1185 1101 

1196 1181 

1410 1116 

1417 1190 

919 1125 


DUR DISH 

252.6 480 

272.4 530 

270.9 557 

273.9 602 

268.9 658 

262.9 749 

270.9 827 

263.4 858 

260.6 808 

231.9 840 

242.7 893 

248.6 950 

258.7 838 

248.4 884 

255.5 905 

240.4 909 


DISP FRIG 

706 943 

582 1175 

659 1269 

837 973 

867 1102 

860 1344 

918 1641 

1017 1225 

1063 1429 

955 1699 

973 1749 

1096 1117 

1086 1242 

990 1684 

1028 1 764 

1003 1328 


WASH DUR 

1036 247.7 

1019 249.1 

1047 251.8 

918 262 

1137 263.3 

1167 280 

1230 288.5 

1081 300.5 

1326 312.6 

1228 322.5 

1297 324.3 

1198 333.1 

1292 344.8 

1342 350.3 

1323 369.1 

1274 356.4 


Note: DISH = dishwashers; DISP = garbage disposers; FRIG = refrigerators; WASH = washing machines; DUR = durable 
goods expenditure, billions of 1982 dollars. 


There are several methods of deseasonalizing a time series, but we will consider only one 
of these methods, namely, the method of dummy variables. 13 To illustrate how the dummy 
variables can be used to deseasonalize economic time series, consider the data given in 
Table 9.3. This table gives quarterly data for the years 1978-1995 on the sale of four major 
appliances, dishwashers, garbage disposers, refrigerators, and washing machines, all data in 
thousands of units. The table also gives data on durable goods expenditure in 1982 billions of 
dollars. 

To illustrate the dummy technique, we will consider only the sales of refrigerators over 
the sample period. But first let us look at the data, which is shown in Figure 9.4. This fig¬ 
ure suggests that perhaps there is a seasonal pattern in the data associated with the various 
quarters. To see if this is the case, consider the following model: 

Y t = a\D\ t + a 2 D 2 t + a 3 t D 3t + a 4 D 4( + u t (9.7.1) 

where Y, — sales of refrigerators (in thousands) and the D’s are the dummies, taking a value 
of 1 in the relevant quarter and 0 otherwise. Note that to avoid the dummy variable trap, we 
are assigning a dummy to each quarter of the year, but omitting the intercept term. If there 
is any seasonal effect in a given quarter, that will be indicated by a statistically significant t 
value of the dummy coefficient for that quarter. 14 

Notice that in Eq. (9.7.1) we are regressing Y effectively on an intercept, except that we 
allow for a different intercept in each season (i.e., quarter). As a result, the dummy coeffi¬ 
cient of each quarter will give us the mean refrigerator sales in each quarter or season 
(why?). 


13 For the various methods of seasonal adjustment, see, for instance, Francis X. Diebold, Elements of 
Forecasting, 2d ed., South-Western Publishing, 2001, Chapter 5. 

14 Note a technical point. This method of assigning a dummy to each quarter assumes that the 
seasonal factor, if present, is deterministic and not stochastic. We will revisit this topic when we 
discuss time series econometrics in Part V of this book. 
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FIGURE 9.4 

Sales of refrigerators 
1978-1985 (quarterly). 


EXAMPLE 9.6 

Seasonality in 

Refrigerator 

Sales 


TABLE 9.4 

U.S. Refrigerator 
Sales (thousands), 
1978-1985 
(quarterly) 

Source: Business Statistics 
and Survey of Current 
Business, Department of 



Year 


From the data on refrigerator sales given in Table 9.4, we obtain the following regression 
results: 

ft = 1,222.1250 lt + 1,467.5OO0 2t + 1,569.75O0 3( + 1,160.000D 4 t 
t= (20.3720) (24.4622) (26.1666) (19.3364) (9.7.2) 

R 2 = 0.5317 

Note: We have not given the standard errors of the estimated coefficients, as each stan¬ 
dard error is equal to 59.9904, because all the dummies take only a value of 1 or zero. 

The estimated a coefficients in Eq. (9.7.2) represent the average, or mean, sales of 
refrigerators (in thousands of units) in each season (i.e., quarter). Thus, the average sale of 
refrigerators in the first quarter, in thousands of units, is about 1 , 222 , that in the second 
quarter about 1,468, that in the third quarter about 1,570, and that in the fourth quarter 
about 1,160. 


FRIG 

DUR 

d 2 

D 3 

04 

FRIG 

DUR 

02 

0S 

04 

1317 

252.6 

0 

0 

0 

943 

247.7 

0 

0 

0 

1615 

272.4 

1 

0 

0 

1175 

249.1 

1 

0 

0 

1662 

270.9 

0 

1 

0 

1269 

251.8 

0 

1 

0 

1295 

273.9 

0 

0 

1 

973 

262.0 

0 

0 

1 

1271 

268.9 

0 

0 

0 

1102 

263.3 

0 

0 

0 

1555 

262.9 

1 

0 

0 

1344 

280.0 

1 

0 

0 

1639 

270.9 

0 

1 

0 

1641 

288.5 

0 

1 

0 

1238 

263.4 

0 

0 

1 

1225 

300.5 

0 

0 

1 

1277 

260.6 

0 

0 

0 

1429 

312.6 

0 

0 

0 

1258 

231.9 

1 

0 

0 

1699 

322.5 

1 

0 

0 

1417 

242.7 

0 

1 

0 

1749 

324.3 

0 

1 

0 

1185 

248.6 

0 

0 

1 

1117 

333.1 

0 

0 

1 

1196 

258.7 

0 

0 

0 

1242 

344.8 

0 

0 

0 

1410 

248.4 

1 

0 

0 

1684 

350.3 

1 

0 

0 

1417 

255.5 

0 

1 

0 

1764 

369.1 

0 

1 

0 

919 

240.4 

0 

0 

1 

1328 

356.4 

0 

0 

1 


Note: FRIG = refrigerator sales, thousands. 

DUR = durable goods expenditure, billions of 1982 dollars. 
Z) 2 = 1 in the second quarter, 0 otherwise. 

D 3 = 1 in the third quarter, 0 otherwise. 

£>4 = 1 in the fourth quarter, 0 otherwise. 
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EXAMPLE 9.6 

0 Continued ) 


Incidentally, instead of assigning a dummy for each quarter and suppressing the inter¬ 
cept term to avoid the dummy variable trap, we could assign only three dummies and 
include the intercept term. Suppose we treat the first quarter as the reference quarter 
and assign dummies to the second, third, and fourth quarters. This produces the follow¬ 
ing regression results (see Table 9.4 for the data setup): 

Y, = 1,222.1250 + 245.3750D 2t + 347.6250D 3t - 62.1250D 4t 
t= (20.3720)* (2.8922)* (4.0974)* (-0.7322)** (9.7.3) 

R 2 = 0.5318 


where* indicates p values less than 5 percent and ** indicates p values greater than 5 percent. 

Since we are treating the first quarter as the benchmark, the coefficients attached to 
the various dummies are now differential intercepts, showing by how much the average 
value of Tin the quarter that receives a dummy value of 1 differs from that of the bench¬ 
mark quarter. Put differently, the coefficients on the seasonal dummies will give the 
seasonal increase or decrease in the average value of Y relative to the base season. If you 
add the various differential intercept values to the benchmark average value of 1,222.125, 
you will get the average value for the various quarters. Doing so, you will reproduce 
exactly Eq. (9.7.2), except for the rounding errors. 

But now you will see the value of treating one quarter as the benchmark quarter, for 
Eq. (9.7.3) shows that the average value of Yfor the fourth quarter is not statistically different 
from the average value for the first quarter, as the dummy coefficient for the fourth quarter 
is not statistically significant. Of course, your answer will change, depending on which quar¬ 
ter you treat as the benchmark quarter, but the overall conclusion will not change. 

How do we obtain the deseasonalized time series of refrigerator sales? This can be done 
easily. You estimate the values of Yfrom model (9.7.2) (or [9.7.3]) for each observation 
and subtract them from the actual values of Y, that is, you obtain ( Y t — Y t ) which are simply 
the residuals from the regression (9.7.2). We show them in Table 9.5. 15 To these residuals, 
we have to add the mean of the Y series to get the forecasted values. 

What do these residuals represent? They represent the remaining components of the 
refrigerator time series, namely, the trend, cycle, and random components (but see the 
caution given in footnote 15). 

Since models (9.7.2) and (9.7.3) do not contain any covariates, will the picture change 
if we bring in a quantitative regressor in the model? Since expenditure on durable goods 
has an important factor influence on the demand for refrigerators, let us expand our 
model (9.7.3) by bringing in this variable. The data for durable goods expenditure in 
billions of 1982 dollars are already given in Table 9.3. This is our (quantitative) X variable 
in the model. The regression results are as follows 


?, = 456.2440 + 242.4976D 2t + 325.2643 D 3t - 86.0804 D 4t + 2.7734X t 
t= (2.5593)* (3.6951)* (4.9421)* (-1.3073)** (4.4496)* (9.7.4) 

R 2 = 0.7298 


where * indicates p values less than 5 percent and 
5 percent. 


indicates p values greater than 
( Continued ) 


15 Of course, this assumes that the dummy variables technique is an appropriate method of deseason- 
alizing a time series and that a time series (TS) can be represented as: TS = s + c+ t+ u, where s 
represents the seasonal, t the trend, c the cyclical, and u the random component. However, if the 
time series is of the form, TS = (s)(c)(t)(u), where the four components enter multiplicatively, the 
preceding method of deseasonalization is inappropriate, for that method assumes that the four 
components of a time series are additive. But we will have more to say about this topic in the 
chapters on time series econometrics. 
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EXAMPLE 9.6 TABLE 9.5 Refrigerator Sales Regression: Actual, Fitted, and Residual Values (Eq. 9.7.3) 


0 Continued ) 




Residuals Graph 



Actual 

Fitted 

Residuals 

( 

) 


1978-1 

1317 

1222.12 

94.875 


* 


1978-11 

1615 

1467.50 

147.500 


* 


1978-111 

1662 

1569.75 

92.250 


*. 


1978-1V 

1295 

1160.00 

135.000 


*. 


1979-1 

1271 

1222.12 

48.875 


* . 


1979-11 

1555 

1467.50 

87.500 


*. 


1979-111 

1639 

1569.75 

69.250 


*. 


1979-IV 

1238 

1160.00 

78.000 


*. 


1980-1 

1277 

1222.12 

54.875 


* . 


1980-11 

1258 

1467.50 

-209.500 




1980-111 

1417 

1569.75 

-152.750 

* 



1980-IV 

1185 

1160.00 

25.000 




1981-1 

1196 

1222.12 

-26.125 




1981-11 

1410 

1467.50 

-57.500 

. * 



1981-111 

1417 

1569.75 

-152.750 

. * 



1981-IV 

919 

1160.00 

-241.000 




1982-1 

943 

1222.12 

-279.125 

* . 



1982-11 

1175 

1467.50 

-292.500 

* . 



1982-111 

1269 

1569.75 

-300.750 

* . 



1982-IV 

973 

1160.00 

-187.000 




1983-1 

1102 

1222.12 

-120.125 




1983-11 

1344 

1467.50 

-123.500 

.* 



1983-111 

1641 

1569.75 

71.250 


* . 


1983-1V 

1225 

1160.00 

65.000 


* . 


1984-1 

1429 

1222.12 

206.875 




1984-11 

1699 

1467.50 

231.500 


. * 


1984-111 

1749 

1569.75 

179.250 


. * 


1984-IV 

1117 

1160.00 

-43.000 

. * 



1985-1 

1242 

1222.12 

19.875 


* 


1985-11 

1684 

1467.50 

216.500 




1985-111 

1764 

1569.75 

194.250 




1985-IV 

1328 

1160.00 

168.000 


* 


- 0 + 


Again, keep in mind that we are treating the first quarter as our base. As in Eq. (9.7.3), 
we see that the differential intercept coefficients for the second and third quarters are sta¬ 
tistically different from that of the first quarter, but the intercepts of the fourth quarter and 
the first quarter are statistically about the same. The coefficient of X (durable goods 
expenditure) of about 2.77 tells us that, allowing for seasonal effects, if expenditure on 
durable goods goes up by a dollar, on average, sales of refrigerators go up by about 
2.77 units, that is, approximately 3 units; bear in mind that refrigerators are in thousands 
of units and X is in (1982) billions of dollars. 

An interesting question here is: Just as sales of refrigerators exhibit seasonal patterns, 
would not expenditure on durable goods also exhibit seasonal patterns? How then do we 
take into account seasonality in X? The interesting thing about Eq. (9.7.4) is that the 
dummy variables in that model not only remove the seasonality in Y but also the season¬ 
ality, if any, in X. (This follows from a well-known theorem in statistics, known as the 
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EXAMPLE 9.6 Frisch-Waugh theorem. 16 ) So to speak, we kill (deseasonalize) two birds (two series) 
(, Continued ) with one stone (the dummy technique). 

If you want an informal proof of the preceding statement, just follow these steps: 
(1) Run the regression of Y on the dummies as in Eq. (9.7.2) or Eq. (9.7.3) and save the 
residuals, say, Si; these residuals represent deseasonalized Y. (2) Run a similar regression 
for X and obtain the residuals from this regression, say, Sr, these residuals represent 
deseasonalized X. (3) Regress Si on S2. You will find that the slope coefficient in this 
regression is precisely the coefficient of X in the regression (9.7.4). 


9.8 Piecewise Linear Regression 

To illustrate yet another use of dummy variables, consider Figure 9.5, which shows how a 
hypothetical company remunerates its sales representatives. It pays commissions based on 
sales in such a manner that up to a certain level, the target, or threshold, level X*, there is 
one (stochastic) commission structure and beyond that level another. {Note: Besides sales, 
other factors affect sales commission. Assume that these other factors are represented 
by the stochastic disturbance term.) More specifically, it is assumed that sales commission 
increases linearly with sales until the threshold level X*, after which it continues to increase 
linearly with sales but at a much steeper rate. Thus, we have a piecewise linear regression 
consisting of two linear pieces or segments, which are labeled I and II in Figure 9.5, and 
the commission function changes its slope at the threshold value. Given the data on com¬ 
mission, sales, and the value of the threshold level X*, the technique of dummy variables 
can be used to estimate the (differing) slopes of the two segments of the piecewise linear 
regression shown in Figure 9.5. We proceed as follows: 

Yi = «! + PiX, + MX, - X*)Di + Ui (9.8.1) 


FIGURE 9.5 

Hypothetical 
relationship between 
sales commission and 
sales volume. 

{Note: The intercept on 
the Y axis denotes 
minimum guaranteed 
commission.) 


Y 



16 For proof, see Adrian C. Darnell, A Dictionary of Econometrics, Edward Elgar, Lyme, U.K., 1995, 
pp. 150-152. 
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where Y, = sales commission 

Xi = volume of sales generated by the sales person 
X* = threshold value of sales also known as a knot (known in advance) 17 
D = 1 if X, > X* 

= 0 if Xi < X* 

Assuming E(ui) — 0, we see at once that 

E(Yi ] A = 0, Xi, X*) = «i + A X { 
which gives the mean sales commission up to the target level X* and 

E(Yi | a = l, x*) = «i - p 2 x* + (A + A )Xi 

which gives the mean sales commission beyond the target level X*. 

Thus, A gives the slope of the regression line in segment I, and A + A gives the slope 
of the regression line in segment II of the piecewise linear regression shown in Figure 9.5. 
A test of the hypothesis that there is no break in the regression at the threshold value X* can 
be conducted easily by noting the statistical significance of the estimated differential slope 
coefficient A (see Figure 9.6). 

Incidentally, the piecewise linear regression we have just discussed is an example of a 
more general class of functions known as spline functions. 18 

FIGURE 9.6 

Parameters of the 
piecewise linear 
regression. 


dependent variable against the explanatory variable(s) and observe if there seems to be a sharp 
change in the relation after a given value of X (i.e., X*). An analytical approach to finding the break 
point can be found in the so-called switching regression models. But this is an advanced topic 
and a textbook discussion may be found in Thomas Fomby, R. Carter Hill, and Stanley Johnson, 
Advanced Econometric Methods, Springer-Verlag, New York, 1984, Chapter 14. 

18 For an accessible discussion on splines (i.e., piecewise polynomials of order k ), see Douglas C. 
Montgomery, Elizabeth A. Peck, and C. Geoffrey Vining, Introduction to Linear Regression Analysis, 
John Wiley & Sons, 3d ed.. New York, 2001, pp. 228-230. 



17 The threshold value may not always be apparent, however. An ad hoc approach is to plot the 


(9.8.2) 

(9.8.3) 
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EXAMPLE 9.7 

Total Cost in 
Relation to 
Output 


As an example of the application of the piecewise linear regression, consider the hypo¬ 
thetical total cost-total output data given in Table 9.6. We are told that the total cost may 
change its slope at the output level of 5,500 units. 

Letting Tin Eq. (9.8.4) represent total cost and X total output, we obtain the following 
results: 

Y,= -145.72 + 0.2791 X; + 0.0945(X, - X*)D, 

t= (-0.8245) (6.0669) (1.1447) (9.8.4) 

R 2 = 0.9737 X* = 5,500 

As these results show, the marginal cost of production is about 28 cents per unit and al¬ 
though it is about 37 cents (28 + 9) for output over 5,500 units, the difference between 
the two is not statistically significant because the dummy variable is not significant at, 
say, the 5 percent level. For all practical purposes, then, one can regress total cost on 
total output, dropping the dummy variable. 


TABLE 9.6 
Hypothetical Data 
on Output and 
Total Cost 


Cost, Dollars 

Output, Units 

256 

1,000 

414 

2,000 

634 

3,000 

778 

4,000 

1,003 

5,000 

1,839 

6,000 

2,081 

7,000 

2,423 

8,000 

2,734 

9,000 

2,914 

10,000 


9.9 Panel Data Regression Models 

Recall that in Chapter 1 we discussed a variety of data that are available for empirical 
analysis, such as cross-section, time series, pooled (combination of time series and cross- 
section data), and panel data. The technique of dummy variable can be easily extended to 
pooled and panel data. Since the use of panel data is becoming increasingly common in 
applied work, we will consider this topic in some detail in Chapter 16. 


9.10 Some Technical Aspects of the Dummy Variable Technique 

The Interpretation of Dummy Variables 
in Semilogarithmic Regressions 

In Chapter 6 we discussed the log-lin models, where the regressand is logarithmic and 
the regressors are linear. In such a model, the slope coefficients of the regressors give the 
.semielasticity, that is, the percentage change in the regressand for a unit change in the 
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regressor. This is only so if the regressor is quantitative. What happens if a regressor is a 
dummy variable? To be specific, consider the following model: 


lnT, = ft + jfcA + «/ 


(9.10.1) 


where Y = hourly wage rate ($) and D = 1 for female and 0 for male. 

How do we interpret such a model? Assuming E(u,) = 0, we obtain: 

Wage function for male workers: 

EQnYi\D, =O) = 0i (9.10.2) 

Wage function for female workers: 

E(. In Yi | A = 1) = Pi + p 2 (9.10.3) 

Therefore, the intercept fi\ gives the mean log hourly earnings and the “slope” coefficient 
gives the difference in the mean log hourly earnings of male and females. This is a rather 
awkward way of stating things. But if we take the antilog of fi\, what we obtain is not the 
mean hourly wages of male workers, but their median wages. As you know, mean, median, 
and mode are the three measures of central tendency of a random variable. And if we take 
the antilog of (fi\ + fif), we obtain the median hourly wages of female workers. 


EXAMPLE 9.8 

Logarithm of 
Hourly Wages 
in Relation 
to Gender 


To illustrate Eq. (9.10.1), we use the data that underlie Example 9.2. The regression results 
based on 528 observations are as follows: 

I nYi= 2.1763 - 0.2437D, 

t= (72.2943)* (-5.5048)* (9.10.4) 

R 2 = 0.0544 


where * indicates p values are practically zero. 

Taking the antilog of 2.1763, we find 8.8136 ($), which is the median hourly earnings 
of male workers, and taking the antilog of [(2.1763 - 0.2437) = 1.92857], we obtain 
6.8796 ($), which is the median hourly earnings of female workers. Thus, the female 
workers' median hourly earnings are lower by about 21.94 percent compared to their male 
counterparts [(8.8136 - 6.8796)/8.81 36], 

Interestingly, we can obtain semielasticity for a dummy regressor directly by the device 
suggested by Halvorsen and Palmquist. 19 Take the antilog (to base e) of the estimated 
dummy coefficient and subtract 1 from it and multiply the difference by 100. (For the under¬ 
lying logic, see Appendix 9.A.1.) Therefore, if you take the antilog of -0.2437, you will 
obtain 0.78366. Subtracting 1 from this gives —0.2163. After multiplying this by 100, we 
get —21.63 percent, suggesting that a female worker's (D = 1) median salary is lower than 
that of her male counterpart by about 21.63 percent, the same as we obtained previously, 
save the rounding errors. 


Dummy Variables and Heteroscedasticity 

Let us revisit our savings-income regression for the United States for the periods 
1970-1981 and 1982-1995 and for the entire period 1970-1995. In testing for structural 
stability using the dummy technique, we assumed that the error var (uu) = var (u%i) = a 2 , 

19 Robert Halvorsen and Raymond Palmquist, "The Interpretation of Dummy Variables in Semilogarithmic 
Equations," American Economic Review, vol. 70, no. 3, pp. 474-475. 
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that is, the error variances in the two periods, were the same. This was also the assumption 
underlying the Chow test. If this assumption is not valid—that is, the error variances in the 
two subperiods are different—it is quite possible to draw misleading conclusions. There¬ 
fore, one must first check on the equality of variances in the subperiod, using suitable 
statistical techniques. Although we will discuss this topic more thoroughly in the chapter 
on heteroscedasticity, in Chapter 8 we showed how the F test can be used for this purpose. 20 
(See our discussion of the Chow test in that chapter.) As we showed there, it seems the error 
variances in the two periods are not the same. Hence, the results of both the Chow test and 
the dummy variable technique presented before may not be entirely reliable. Of course, our 
purpose here is to illustrate the various techniques that one can use to handle a problem 
(e.g., the problem of structural stability). In any particular application, these techniques 
may not be valid. But that is par for most statistical techniques. Of course, one can take 
appropriate remedial actions to resolve the problem, as we will do in the chapter on 
heteroscedasticity later (however, see Exercise 9.28). 

Dummy Variables and Autocorrelation 

Besides homoscedasticity, the classical linear regression model assumes that the error 
term in the regression models is uncorrelated. But what happens if that is not the case, espe¬ 
cially in models involving dummy regressors? Since we will discuss the topic of autocor¬ 
relation in depth in the chapter on autocorrelation, we will defer the answer to this question 
until then. 


What Happens If the Dependent Variable 
Is a Dummy Variable? 

So far we have considered models in which the regressand is quantitative and the regressors 
are quantitative or qualitative or both. But there are occasions where the regressand can 
also be qualitative or dummy. Consider, for example, the decision of a worker to participate 
in the labor force. The decision to participate is of the yes or no type, yes if the person 
decides to participate and no otherwise. Thus, the labor force participation variable is a 
dummy variable. Of course, the decision to participate in the labor force depends on several 
factors, such as the starting wage rate, education, and conditions in the labor market 
(as measured by the unemployment rate). 

Can we still use ordinary least squares (OLS) to estimate regression models where the 
regressand is dummy? Yes, mechanically, we can do so. But there are several statistical 
problems that one faces in such models. And since there are alternatives to OLS estima¬ 
tion that do not face these problems, we will discuss this topic in a later chapter 
(see Chapter 15 on logit and probit models). In that chapter we will also discuss models 
in which the regressand has more than two categories; for example, the decision to travel 
to work by car, bus, or train, or the decision to work part-time, full time, or not work at 
all. Such models are called polytomous dependent variable models in contrast to 
dichotomous dependent variable models in which the dependent variable has only two 
categories. 


20 The Chow test procedure can be performed even in the presence of heteroscedasticity, but then 
one will have to use the Wald test. The mathematics involved behind the test are somewhat 
involved. But in the chapter on heteroscedasticity, we will revisit this topic. 
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9.11 Topics for Further Study 

Several topics related to dummy variables are discussed in the literature that are rather ad¬ 
vanced, including (1) random, or varying, parameters models, (2) switching regression 
models, and (3) disequilibrium models. 

In the regression models considered in this text it is assumed that the parameters, the 
/Ts, are unknown hut fixed entities. The random coefficient models—and there are several 
versions of them—assume the /Fs can be random too. A major reference work in this area 
is by Swamy. 21 

In the dummy variable model using both differential intercepts and slopes, it is implicitly 
assumed that we know the point of break. Thus, in our savings-income example for 
1970-1995, we divided the period into 1970-1981 and 1982-1995, the pre- and postreces¬ 
sion periods, under the belief that the recession in 1982 changed the relation between 
savings and income. Sometimes it is not easy to pinpoint when the break has taken place. 
The technique of switching regression models (SRM) has been developed for such situa¬ 
tions. SRM treats the breakpoint as a random variable and through an iterative process 
determines when the break might have actually taken place. The seminal work in this area is 
by Goldfeld and Quandt. 22 

Special estimation techniques are required to deal with what are known as disequilib¬ 
rium situations, that is, situations where markets do not clear (i.e., demand is not equal to 
supply). The classic example is that of demand for and supply of a commodity. The demand 
for a commodity is a function of its price and other variables, and the supply of the com¬ 
modity is a function of its price and other variables, some of which are different from those 
entering the demand function. Now the quantity actually bought and sold of the commod¬ 
ity may not necessarily be equal to the one obtained by equating the demand to supply, thus 
leading to disequilibrium. For a thorough discussion of disequilibrium models, the reader 
may refer to Quandt. 23 

9.12 A Concluding Example 

We end this chapter with an example that illustrates some of the points made in this chap¬ 
ter. Table 9.7 provides data on a sample of 261 workers in an industrial town in southern 
India in 1990. 

The variables are defined as follows: 

WI = weekly wage income in rupees 
Age = age in years 

D scx = 1 for male workers and 0 for female workers 

DE 2 = a dummy variable taking a value of 1 for workers with an education level up to primary 
DE 3 = a dummy variable taking a value of 1 for workers up to a secondary level of 
education 

DE a = a dummy variable taking a value of 1 for workers with higher than secondary 
education 

DPT — a dummy variable taking a value of 1 for workers with permanent jobs and a 
value of 0 for temporary workers 


21 P. A.V. B. Swamy, Statistical Inference in Random Coefficient Regression Models, Springer-Verlag, Berlin, 
1971. 

22 S. Goldfeld and R. Quandt, Nonlinear Methods in Econometrics, North Holland, Amsterdam, 1972. 
23 Richard E. Quandt, The Econometrics of Disequilibrium, Basil Blackwell, New York, 1988. 




TABLE 9.7 Indian Wage Earners, 1990 


Wl 

AGE 

de 2 

DEi 

de 4 

DPT 

Dsex 

Wl 

AGE 

de 2 

de 3 

de 4 

DPT 

Dsex 

120 

57 

0 

0 

0 

0 

0 

120 

21 

0 

0 

0 

0 

0 

224 

48 

0 

0 

1 

1 

0 

25 

18 

0 

0 

0 

0 

1 

132 

38 

0 

0 

0 

0 

0 

25 

11 

0 

0 

0 

0 

1 

75 

27 

0 

1 

0 

0 

0 

30 

38 

0 

0 

0 

1 

1 

111 

23 

0 

1 

0 

0 

1 

30 

17 

0 

0 

0 

1 

1 

127 

22 

0 

1 

0 

0 

0 

122 

20 

0 

0 

0 

0 

0 

30 

18 

0 

0 

0 

0 

0 

288 

50 

0 

1 

0 

1 

0 

24 

12 

0 

0 

0 

0 

0 

75 

45 

0 

0 

0 

0 

1 

119 

38 

0 

0 

0 

1 

0 

79 

60 

0 

0 

0 

0 

0 

75 

55 

0 

0 

0 

0 

0 

85.3 

26 

1 

0 

0 

0 

1 

324 

26 

0 

1 

0 

0 

0 

350 

42 

0 

1 

0 

1 

0 

42 

18 

0 

0 

0 

0 

0 

54 

62 

0 

0 

0 

1 

0 

100 

32 

0 

0 

0 

0 

0 

110 

23 

0 

0 

0 

0 

0 

136 

41 

0 

0 

0 

0 

0 

342 

56 

0 

0 

0 

1 

0 

107 

48 

0 

0 

0 

0 

0 

77.5 

19 

0 

0 

0 

1 

0 

50 

16 

1 

0 

0 

0 

1 

370 

46 

0 

0 

0 

0 

0 

90 

45 

0 

0 

0 

0 

0 

156 

26 

0 

0 

0 

1 

0 

377 

46 

0 

0 

0 

1 

0 

261 

23 

0 

0 

0 

0 

0 

150 

30 

0 

1 

0 

0 

0 

54 

16 

0 

1 

0 

0 

0 

162 

40 

0 

0 

0 

0 

0 

130 

33 

0 

0 

0 

0 

0 

18 

19 

1 

0 

0 

0 

0 

112 

27 

1 

0 

0 

0 

0 

128 

25 

1 

0 

0 

0 

0 

82 

22 

1 

0 

0 

0 

0 

47.5 

46 

0 

0 

0 

0 

1 

385 

30 

0 

1 

0 

1 

0 

135 

25 

0 

1 

0 

0 

0 

94.3 

22 

0 

0 

1 

1 

1 

400 

57 

0 

0 

0 

1 

0 

350 

57 

0 

0 

0 

1 

0 

91.8 

35 

0 

0 

1 

1 

0 

108 

26 

0 

0 

0 

0 

0 

140 

44 

0 

0 

0 

1 

0 

20 

14 

0 

0 

0 

0 

0 

49.2 

22 

0 

0 

0 

0 

0 

53.8 

14 

0 

0 

0 

0 

1 

30 

19 

1 

0 

0 

0 

0 

427 

55 

0 

0 

0 

1 

0 

40.5 

37 

0 

0 

0 

0 

1 

18 

12 

0 

0 

0 

0 

0 

81 

20 

0 

0 

0 

0 

0 

120 

38 

0 

0 

0 

0 

0 

105 

40 

0 

0 

0 

0 

0 

40.5 

17 

0 

0 

0 

0 

0 

200 

30 

0 

0 

0 

0 

0 

375 

42 

1 

0 

0 

1 

0 

140 

30 

0 

0 

0 

1 

0 

120 

34 

0 

0 

0 

0 

0 

80 

26 

0 

0 

0 

0 

0 

175 

33 

1 

0 

0 

1 

0 

47 

41 

0 

0 

0 

0 

1 

50 

26 

0 

0 

0 

0 

1 

125 

22 

0 

0 

0 

0 

0 

100 

33 

1 

0 

0 

1 

0 

500 

21 

0 

0 

0 

0 

0 

25 

22 

0 

0 

0 

1 

1 

100 

19 

0 

0 

0 

0 

0 

40 

15 

0 

0 

0 

1 

0 

105 

35 

0 

0 

0 

0 

0 

65 

14 

0 

0 

0 

1 

0 

300 

35 

0 

1 

0 

1 

0 

47.5 

25 

0 

0 

0 

1 

1 

115 

33 

0 

1 

0 

1 

1 

163 

25 

0 

0 

0 

1 

0 

103 

27 

0 

0 

1 

1 

1 

175 

50 

0 

0 

0 

1 

1 

190 

62 

1 

0 

0 

0 

0 

150 

24 

0 

0 

0 

1 

1 

62.5 

18 

0 

1 

0 

0 

0 

163 

28 

0 

0 

0 

1 

0 

50 

25 

1 

0 

0 

0 

0 

163 

30 

1 

0 

0 

1 

0 

273 

43 

0 

0 

1 

1 

1 

50 

25 

0 

0 

0 

1 

1 

175 

40 

0 

1 

0 

1 

0 

395 

45 

0 

1 

0 

1 

0 

117 

26 

1 

0 

0 

1 

0 

175 

40 

0 

0 

0 

1 

1 

950 

47 

0 

0 

1 

0 

0 

87.5 

25 

1 

0 

0 

0 

0 

100 

30 

0 

0 

0 

0 

0 

75 

18 

0 

0 

0 

0 

0 

140 

30 

0 

0 

0 

0 

0 

163 

24 

0 

0 

0 

1 

0 

97 

25 

0 

1 

0 

0 

0 

325 

55 

0 

0 

0 

1 

0 

150 

36 

0 

0 

0 

0 

0 

121 

27 

0 

1 

0 

0 

0 

25 

28 

0 

0 

0 

0 

1 

600 

35 

1 

0 

0 

0 

0 

15 

13 

0 

0 

0 

0 

1 

52 

19 

0 

0 

0 

0 

0 

131 

55 

0 

0 

0 

0 

0 

117 

28 

1 

0 

0 

0 

0 
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The reference category is male workers with no primary education and temporary jobs. 
Our interest is in finding out how weekly wages relate to age, sex, level of education, and 
job tenure. For this purpose, we estimate the following regression model: 


In WI, = fii+ P> 2 AGE, + ftDs-Ex + foDE 2 + P 5 DE 3 + p 6 DE 4 + p n DPT+ u, 


Following the literature in Labor Economics, we are expressing the (natural) log of wages 
as a function of the explanatory variables. As noted in Chapter 6, the size distribution of 
variables such as wages tends to be skewed; logarithmic transformations of such variables 
reduce both skewness and heteroscedasticity. 

Using EViews6, we obtain the following regression results. 


Dependent Variable: Ln(WI) 
Method: Least Squares 
Sample: 1 261 

Included observations: 261 



Coefficient 

Std. Error 

t-Statistic 

Prob. 

c 

3.706872 

0.113845 

32.56055 

0.0000 

AGE 

0.026549 

0.003117 

8.516848 

0.0000 

D SEX 

-0.656338 

0.088796 

-7.391529 

0.0000 

de 2 

0.113862 

0.098542 

1.155473 

0.2490 

de 3 

0.412589 

0.096383 

4.280732 

0.0000 

DEi 

0.554129 

0.155224 

3.569862 

0.0004 

DPT 

0.558348 

0.079990 

6.980248 

0.0000 

R-squared 

0.534969 

Mean dependent var. 

4.793390 

Adjusted R-squared 

0.523984 

S.D. dependent var. 

0.834277 

S.E. 

of regression 

0.575600 

Akaike 

ittjfo criterion 

1.759648 

Sum 

squared resid. 

84.15421 

Schwarz 

: criterion 

1.855248 

Log 

likelihood 

-222.6340 

Hannan- 

•Quinn criter. 

1.798076 

F-statistic 

48.70008 

Durbin- 

•Watson stat. 

1.853361 

Prob(F-statistic) 

0.000000 





These results show that the logarithm of wages is positively related to age, education, and 
job permanency but negatively related to gender, an unsurprising finding. Although there 
seems to be no practical difference in the weekly wages of workers with primary or less- 
than-primary education, the weekly wages are higher for workers with secondary education 
and much more so for workers with higher education. 

The coefficients of the dummy variables are to be interpreted as differential values from 
the reference category. Thus, the coefficient of the DPT variable suggests that those work¬ 
ers who have permanent jobs on average make more money than those workers whose jobs 
are temporary. 

As we know from Chapter 6, in a log-lin model (dependent variable in the logarithm 
form and the explanatory variables in the linear form), the slope coefficient of an 
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explanatory variable represents semielasticity, that is, it gives the relative or percentage 
change in the dependent variable for a unit change in the value of the explanatory variable. 
But as noted in the text, when the explanatory variable is a dummy variable, we have to be 
very careful. Here we have to take the anti-log of the estimated dummy coefficient, subtract 
1 from it, and multiply the result by 100. Thus, to find out the percentage change in weekly 
wages for those workers who have permanent jobs versus those who have temporary 
jobs, we take the anti-log of the DPT coefficient of 0.558348, subtract 1, and then multiply 
the difference by 100. For our example, this turns out to be ( e °- 558348 _ l) = (1.74778 -1) = 
0.74778, or about 75%. The reader is advised to calculate such percentage changes for the 
other dummy variables included in the model. 

Our results show that gender and education have differential effects on weekly earnings. 
Is it possible that there is an interaction between gender and the level of education? Do 
male workers with higher education earn higher weekly wages than female workers with 
higher education? To examine this possibility, we can extend the above wage regression by 
interacting gender with education. The regression results are as follows: 


Dependent Variable: Ln(WI) 
Method: Least Squares 
Sample: 1 261 

Included observations: 261 



Coefficienl 

t Std. Error 

fc-Statistic 

Prob, 

c 

3.717540 

0.114536 

32.45734 

0.0000 

AGE 

0.027051 

0.003133 

8.634553 

0.0000 

A=ex 

-0.758975 

0.110410 

-6.874148 

0.0000 

de 2 

0.088923 

0.106827 

0.832402 

0.4060 

de 3 

0.350574 

0.104309 

3.360913 

0.0009 

DEi 

0.438673 

0.186996 

2.345898 

0.0198 

Dsb x*DE 2 

0.114908 

0.275039 

0.417788 

0.6765 

Dbbx*DE3 

0.391052 

0.259261 

1.508337 

0.1327 

d bb x*DEt 

0.369520 

0.313503 

1.178681 

0.2396 

DPT 

0.551658 

0.080076 

6.889198 

0.0000 

R-squared 


0.54081® Mean dependent var. 

4.793390 

Adjusted R-squared 

0.524345 S.D. dependent var. 

0.834277 

S.E. of regression 

0.575382 Akaike 

Info criterion 

1.769997 

Sum squared 

resid. 

83.09731 Schwarz criterion. 

1.906569 

Log likelihood -220.9847 Hannan- 

-Quinn criter. 

1.. 824895 

F-statistic 


32.84603 Durbin- 

-Watson stat. 

1,. 856488 

Prob (F-stat 

istic) 

0.000000 




Although the interaction dummies show that there is some interaction between gender 
and the level of education, the effect is not statistically significant, for all the interaction 
coefficients are not individually statistically significant. 
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Summary and 
Conclusions 


Interestingly, if we drop the education dummies but retain the interaction dummies, we 
obtain the following results: 


Dependent Variable:; LOG(WI) 
Method: Least Squares 
Sample: 1 261 

Included observations: 261 




Coefficient Std. Errqr 

fc-Statistic 

Prob. 

6 


3.836483 

0.106785 

35.92735 

0.0000 

AGE 


0.025990 

0.003170 

8.1979:91 

0.0000 

-Dsex 


0.868617 

0.106429 

-8.161508 

0.0000 

Dsex'' 

<de 2 

0.200823 

0.253511 

0,773851 

0.4397 

Dsex* 

'de 3 

0.716722 

0.245021 

2.925140 

0.0038 

IW 

’DE& 

0.752652 

0.265975 

2.829789 

0.0050 

DPT 


0.627272 

0.078869 

7.953332 

0.0000 

R-squared 


0.514449 Mean 

dependent var. 

4.793390 

Adjc 

isted R-squared 

0.502979 S.D. 

dependent var. 

0.83427f 

S.E. 

of regression 

Q. 588163 Akail 

ke info criterion 

1.802828 

Sum 

squared 

resid. 

87.86766 Schwarz criterion. 

1.898429 

Log 

likelihood -228.2691 Hannan-Quinn criter. 

-1', 841257 

.E-statistic 


44.85284 Durbin-Watson stat. 

1.873421 

Prob (E-stat 

iStic) 

O.OOOOOf 




It now seems that education dummies by themselves have no effect on weekly wages, but 
introduced in an interactive format they seem to. As this exercise shows, one must be care¬ 
ful in the use of dummy variables. It is left as an exercise for the reader to find out if the 
education dummies interact with DPT. 


1. Dummy variables, taking values of 1 and zero (or their linear transforms), are a means 
of introducing qualitative regressors in regression models. 

2. Dummy variables are a data-classifying device in that they divide a sample into various 
subgroups based on qualities or attributes (gender, marital status, race, religion, etc.) 
and implicitly allow one to run individual regressions for each subgroup. If there are 
differences in the response of the regressand to the variation in the qualitative variables 
in the various subgroups, they will be reflected in the differences in the intercepts or 
slope coefficients, or both, of the various subgroup regressions. 

3. Although a versatile tool, the dummy variable technique needs to be handled carefully. 
First, if the regression contains a constant term, the number of dummy variables must be 
one less than the number of classifications of each qualitative variable. Second, the 
coefficient attached to the dummy variables must always be interpreted in relation to 
the base, or reference, group—that is, the group that receives the value of zero. The base 
chosen will depend on the purpose of research at hand. Finally, if a model has several 
qualitative variables with several classes, introduction of dummy variables can consume 
a large number of degrees of freedom. Therefore, one should always weigh the number 
of dummy variables to be introduced against the total number of observations available 
for analysis. 
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4. Among its various applications, this chapter considered but a few. These included 
(1) comparing two (or more) regressions, (2) deseasonalizing time series data, (3) inter¬ 
active dummies, (4) interpretation of dummies in semilog models, and (4) piecewise 
linear regression models. 

5. We also sounded cautionary notes in the use of dummy variables in situations of 
heteroscedasticity and autocorrelation. But since we will cover these topics fully in 
subsequent chapters, we will revisit these topics then. 


EXERCISES 


Questions 

9.1. If you have monthly data over a number of years, how many dummy variables will 
you introduce to test the following hypotheses: 

a. All the 12 months of the year exhibit seasonal patterns. 

b. Only February, April, June, August, October, and December exhibit seasonal 
patterns. 

9.2. Consider the following regression results (t ratios are in parentheses):* 

% m 1286 4 104.97Jr 2 |- 0.026X3,- + 1.20X,; 4 0.69X 5 ; 

t = (4.67) (3.70) (-3.80) (0.24) (0.08) 

— 19.47X6; 4 266.O6X7; - 118.64X8, — IIO.6IX9, 

(-0.40) (6.94) (-3.04) (-6.14) 

R 2 = 0.383 n = 1543 


where Y 


X 2 

X 3 

X 4 

*5 

^6 

x 7 

X 9 


wife’s annual desired hours of work, calculated as usual hours of work 

per year plus weeks looking for work 

after-tax real average hourly earnings of wife 

husband’s previous year after-tax real annual earnings 

wife’s age in years 

years of schooling completed by wife 

attitude variable, 1 m if respondent felt that it was all right for a woman 

to work if she desired and her husband agrees, 0 = otherwise 

attitude variable, 1 = if the respondent’s husband favored his wife’s 

working, 0 = otherwise 

number of children less than 6 years of age 

number of children in age groups 6 to 13 


a. Do the signs of the coefficients of the various nondummy regressors make 
economic sense? Justify your answer. 

b. How would you interpret the dummy variables, X6 andX/? Are these dummies sta¬ 
tistically significant? Since the sample is quite large, you may use the “2-f” rule of 
thumb to answer the question. 

c. Why do you think that age and education variables are not significant factors in a 
woman’s labor force participation decision in this study? 


‘Jane Leuthold, "The Effect of Taxation on the Hours Worked by Married Women," Industrial and 
Labor Relations Review, no. 4, July 1978, pp. 520-526 (notation changed to suit our format). 
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TABLE 9.8 

Data Matrix for 
Regression, in 

Year 

Unem¬ 

ployment 

Job 

Vacancy 



Year 

Unem- Job 

ployment Vacancy 


and 

Rate UN, 

Rate V, 



and 

Rate UN, 

Rate V, 




Quarter 

% 

% 

D 

DV 

Quarter 

% 

% 

D 

DV 

Source: Damodar Gujarati, 

1958-IV 

1.915 

0.510 

0 

0 

1965-1 

1.201 

0.997 

0 

0 

Unemployment and Unfilled 

1959-1 

1.876 

0.541 

0 

0 

-II 

1.192 

1.035 

0 

0 

Vacancies: Great Britain, 
1958-1971,” The Economic 
Journal, vol. 82, March 1972, 

-II 

-III 

1.842 

1.750 

0.541 

0.690 

0 

0 

0 

0 

-III 

-IV 

1.259 

1.192 

1.040 

1.086 

0 

0 

0 

0 

p. 202. 

-IV 

1.648 

0.771 

0 

0 

1966-1 

1.089 

1.101 

0 

0 


1960-1 

1.450 

0.836 

0 

0 

-II 

1.101 

1.058 

0 

0 


-II 

1.393 

0.908 

0 

0 

-III 

1.243 

0.987 

0 

0 


-III 

1.322 

0.968 

0 

0 

-IV 

1.623 

0.819 

1 

0.819 


-IV 

1.260 

0.998 

0 

0 

1967-1 

1.821 

0.740 

1 

0.740 


1961-1 

1.171 

0.968 

0 

0 

-II 

1.990 

0.661 

1 

0.661 


-II 

1.182 

0.964 

0 

0 

-III 

2.114 

0.660 

1 

0.660 


-III 

1.221 

0.952 

0 

0 

-IV 

2.115 

0.698 

1 

0.698 


-IV 

1.340 

0.849 

0 

0 

1968-1 

2.150 

0.695 

1 

0.695 


1962-1 

1.411 

0.748 

0 

0 

-II 

2.141 

0.732 

1 

0.732 


-II 

1.600 

0.658 

0 

0 

-III 

2.167 

0.749 

1 

0.749 


-III 

1.780 

0.562 

0 

0 

-IV 

2.107 

0.800 

1 

0.800 


-IV 

1.941 

0.510 

0 

0 

1969-1 

2.104 

0.783 

1 

0.783 


1963-1 

2.178 

0.510 

0 

0 

-II 

2.056 

0.800 

1 

0.800 


-II 

2.067 

0.544 

0 

0 

-III 

2.170 

0.794 

1 

0.794 


-III 

1.942 

0.568 

0 

0 

-IV 

2.161 

0.790 

1 

0.790 


-IV 

1.764 

0.677 

0 

0 

1970-1 

2.225 

0.757 

1 

0.757 


1964-1 

1.532 

0.794 

0 

0 

-II 

2.241 

0.746 

1 

0.746 


-II 

1.455 

0.838 

0 

0 

-III 

2.366 

0.739 

1 

0.739 


-III 

1.409 

0.885 

0 

0 

-IV 

2.324 

0.707 

1 

0.707 


-IV 

1.296 

0.978 

0 

0 

1971-1 

2.516* 

0.583* 

1 

0.583* 







-II 

2.909* 

0.524* 

1 

0.524* 


9.3. Consider the following regression results.* (The actual data are in Table 9.8.) 

UN, = 2.7491 + 1.1507A- 1.5294K, - 0.8511 

t= (26.896) (3.6288) (-12.5552) (-1.9819) 

R 2 = 0.9128 

where UN = unemployment rate, % 

V — job vacancy rate, % 

0=1, for period beginning in 1966-IV 
= 0, for period before 1966-IV 
t = time, measured in quarters 

Note: In the fourth quarter of 1966, the (then) Labor government liberalized the 
National Insurance Act by replacing the flat-rate system of short-term unemploy¬ 
ment benefits by a mixed system of flat-rate and (previous) eamings-related benefits, 
which increased the level of unemployment benefits. 


‘Damodar Gujarati, "The Behaviour of Unemployment and Unfilled Vacancies: Great Britain, 
1958-1971," The Economic journal, vol. 82, March 1972, pp. 195-202. 
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a. What are your prior expectations about the relationship between the unemploy¬ 
ment and vacancy rates? 

b. Holding the job vacancy rate constant, what is the average unemployment rate in 
the period beginning in the fourth quarter of 1966? Is it statistically different from 
the period before 1966 fourth quarter? How do you know? 

c. Are the slopes in the pre- and post-1966 fourth quarter statistically different? How 
do you know? 

d. Is it safe to conclude from this study that generous unemployment benefits lead to 
higher unemployment rates? Does this make economic sense? 

9.4. From annual data for 1972-1979, William Nordhaus estimated the following model 
to explain the OPEC’s oil price behavior (standard errors in parentheses).* 

y t — 0.3xi*# 5.22x21 

se = (0.03) (0.50) 

where y — difference between current and previous year’s price (dollars per barrel) 
xi = difference between current year’s spot price and OPEC’s price in the 
previous year 

X2 — 1 for 1974 and 0 otherwise 

Interpret this result and show the results graphically. What do these results suggest 
about OPEC’s monopoly power? 

9.5. Consider the following model 

Yj — a i + a 2 Dt + fiXj + w, 

where Y — annual salary of a college professor 
X — years of teaching experience 
D — dummy for gender 

Consider three ways of defining the dummy variable. 

a. D = 1 for male, 0 for female. 

b. D = 1 for female, 2 for male. 

c. D — 1 for female, — 1 for male. 

Interpret the preceding regression model for each dummy assignment. Is one method 
preferable to another? Justify your answer. 

9.6. Refer to regression (9.7.3). How would you test the hypothesis that the coefficients 
of Z?2 and D 2 are the same? And that the coefficients of D 2 and D 4 are the same? If 
the coefficient of £>3 is statistically different from that of D 2 and the coefficient of D 4 
is different from that of D 2 , does that mean that the coefficients D 2 and D 4 are also 
different? 

Hint: var (A ± B) = var (A) + var (S) ± 2 cov (A, B) 

9.7. Refer to the U.S. savings-income example discussed in Section 9.5. 

a. How would you obtain the standard errors of the regression coefficients given in 
Eqs. (9.5.5) and (9.5.6), which were obtained from the pooled regression (9.5.4)? 

b. To obtain numerical answers, what additional information, if any, is required? 


'"Oil and Economic Performance in Industrial Countries," Brookings Papers on Economic Activity, 1980, 
pp. 341-388. 
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9.8. In his study on the labor hours spent by the FDIC (Federal Deposit Insurance Corpo¬ 
ration) on 91 bank examinations, R. J. Miller estimated the following function:* 

InT = 2.41 + 0.3674 lnXi + 0.2217 lnX 2 + 0.0803 In X 3 
(0.0477) (0.0628) (0.0287) 

-0.1755.Di + 0.2799D 2 + 0.5634D 3 - 0.2572D 4 
(0.2905) (0.1044) (0.1657) (0.0787) 

R 2 = 0.766 

where Y = FDIC examiner labor hours 
X\ — total assets of bank 
X 2 = total number of offices in bank 
X 3 = ratio of classified loans to total loans for bank 
D\ = 1 if management rating was “good” 

D 2 = 1 if management rating was “fair” 

D 3 = 1 if management rating was “satisfactory” 

D 4 = 1 if examination was conducted jointly with the state 

The figures in parentheses are the estimated standard errors. 

a. Interpret these results. 

b. Is there any problem in interpreting the dummy variables in this model since Y is 
in the log form? 

c. How would you interpret the dummy coefficients? 

9.9. To assess the effect of the Fed’s policy of deregulating interest rates beginning in July 
1979, Sidney Langer, a student of mine, estimated the following model for the quar¬ 
terly period of 1975-III to 1983-D.f 

% = 8.5871 - 0.1328P, — 0.7102Un, - 0.2389 M, 
se = (1.9563) (0.0992) (0.1909) (0.0727) 

+ 0.65927,_! + 2.583 lDum, R 2 = 0.9156 
(0.1036) (0.7549) 

where Y — 3-month Treasury bill rate 
P — expected rate of inflation 
Un = seasonally adjusted unemployment rate 
M — changes in the monetary base 

Dum = dummy, taking value of 1 for observations beginning July 1, 1979 

a. Interpret these results. 

b. What has been the effect of interest rate deregulation? Do the results make 
economic sense? 

c. The coefficients of P t , Un,, and M t are negative. Can you offer an economic 
rationale? 

9.10. Refer to the piecewise regression discussed in the text. Suppose there not only is a 
change in the slope coefficient at X* but also the regression line jumps, as shown in 
Figure 9.7. How would you modify Eq. (9.8.1) to take into account the jump in the 
regression line at X*? 

’"Examination of Man-Hour Cost for Independent, Joint, and Divided Examination Programs," journal 
of Bank Research, vol. 11,1980, pp. 28-35. Note: The notations have been altered to conform with 
our notations. 

^Sidney Langer, "Interest Rate Deregulation and Short-Term Interest Rates," unpublished term paper. 
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FIGURE 9.7 

Discontinuous 
piecewise linear 
regression. 


Y 



9.11. Determinants of price per ounce of cola. Cathy Schaefer, a student of mine, 
estimated the following regression from cross-sectional data of 77 observations:* 

Pi = Po + PiDu + P2D2 i + P3D31 + Mi 
where P, = price per ounce of cola 
Du = 001 if discount store 
= 010 if chain store 
= 100 if convenience store 
£>2, = 10 if branded good 
= 01 if unbranded good 
£>3, = 0001 if 67.6 ounce (2 liter) bottle 

= 0010 if 28-33.8 ounce bottles (Note: 33.8 oz = 1 liter) 

= 0100 if 16-ounce bottle 
= 1000 if 12-ounce can 
The results were as follows: 

Pi = 0.0143 - 0.000004A, + 0.0090£> 2 ; + 0.00001£> 3 ; 

se = (0.00001) (0.00011) (0.00000) 

t = (-0.3837) (8.3927) (5.8125) 

R 2 = 0.6033 

Note: The standard errors are shown only to five decimal places. 

a. Comment on the way the dummies have been introduced in the model. 

b. Assuming the dummy setup is acceptable, how would you interpret the results? 

c. The coefficient of D 3 is positive and statistically significant. How do you rational¬ 
ize this result? 

9.12. From data for 101 countries on per capita income in dollars ( X) and life expectancy in 
years (F) in the early 1970s, Sen and Srivastava obtained the following regression re¬ 
sults:' 1 ' 

% = -2.40 + 9.39 In A) - 3.36 [DflnXi - 7)] 
se = (4.73) (0.859) (2.42) R 2 = 0.752 

where D, = 1 if In A, > 7, and D, = 0 otherwise. Note: When In X l — 7, X — 
$1,097 (approximately). 

*Cathy Schaefer, "Price Per Ounce of Cola Beverage as a Function of Place of Purchase, Size of 
Container, and Branded or Unbranded Product," unpublished term project. 
tAshish Sen and Muni Srivastava, Regression Analysis: Theory, Methods, and Applications, Springer- 
Verlag, New York, 1990, p. 92. Notation changed. 
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a. What might be the reason(s) for introducing the income variable in the log form? 

b. How would you interpret the coefficient 9.39 of In ft? 

c. What might be the reason for introducing the regressor D,(ln ft — 7)? How do 
you explain this regressor verbally? And how do you interpret the coefficient 
—3.36 of this regressor (Hint: linear piecewise regression)? 

d. Assuming per capita income of $1,097 as the dividing line between poorer and 
richer countries, how would you derive the regression for countries whose per 
capita is less than $ 1,097 and the regression for countries whose per capita income 
is greater than $1,097? 

e. What general conclusions do you draw from the regression result presented in this 
problem? 

9.13. Consider the following model: 

where ft = 0 for the first 20 observations and ft = 1 for the remaining 30 
observations. You are also told that var (m?) = 300. 

a. How would you interpret ft and ft ? 

b. What are the mean values of the two groups? 

c. How would you compute the variance of (ft + ft2)'? Note: You are given that the 
cov(ft,ft) m -15. 

9.14. To assess the effect of state right-to-work laws (which do not require membership in 
the union as a precondition of employment) on union membership, the following re¬ 
gression results were obtained, from the data for 50 states in the United States for 
1982:* 

PVT, = 19.8066 - 9.3917 RTW, 

t = (17.0352) (-5.1086) 

r 2 = 0.3522 

where PVT = percentage of private sector employees in unions, 1982, and RTW = 1 
if right-to-work law exists, 0 otherwise. Note: In 1982, twenty states had right-to- 
work laws. 

a. A priori, what is the expected relationship between PVT and RTW? 

b. Do the regression results support the prior expectations? 

c. Interpret the regression results. 

d. What was the average percent of private sector employees in unions in the states 
that did not have the right-to-work laws? 

9.15. In the following regression model: 

Yi=P 1 + ft A + ut 

Y represents hourly wage in dollars and D is the dummy variable, taking a value of 1 
for a college graduate and a value of 0 for a high-school graduate. Using the OLS for¬ 
mulas given in Chapter 3, show that ft = Yh g and ft = 7 cg — ftg, where the sub¬ 
scripts have the following meanings: hg = high-school graduate, eg = college 
graduate. In all, there are n\ high-school graduates and n 2 college graduates, for a total 
sample of n — n\ + n 2 . 

‘The data used in the regression results were obtained from N. M. Meltz, "Interstate and 
Interprovincial Differences in Union Density," Industrial Relations, vol. 28, no. 2, 1989, pp. 142-158. 
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9.16. To study the rate of growth of population in Belize over the period 1970-1992, 
Mukherjee et al. estimated the following models:* 

Model I: hT(P^)7= 4.73 + 0.024* 

m (781.25) (54.71) 

Model II: hT(P^)7= 4.77 + 0.015* - 0.075A + 0.011(A0 

* = (2477.92) (34.01) (-17.03) (25.54) 

where Pop = population in millions, t = trend variable, A = 1 for observations be¬ 
ginning in 1978 and 0 before 1978, and In stands for natural logarithm. 

a. In Model I, what is the rate of growth of Belize’s population over the sample period? 

b. Are the population growth rates statistically different pre- and post-1978? How do 
you know? If they are different, what are the growth rates for 1972-1977 and 
1978-1992? 

Empirical Exercises 

9.17. Using the data given in Table 9.8, test the hypothesis that the error variances in the 
two subperiods 1958-IV to 1966—III and 1966-IV to 1971-11 are the same. 

9.18. Using the methodology discussed in Chapter 8, compare the unrestricted and restricted 
regressions (9.7.3) and (9.7.4); that is, test for the validity of the imposed restrictions. 

9.19. In the U.S. savings-income regression (9.5.4) discussed in the chapter, suppose that 
instead of using 1 and 0 values for the dummy variable you use Z t = a + bD t , where 
A = 1 and 0, a = 2, and b = 3. Compare your results. 

9.20. Continuing with the savings-income regression (9.5.4), suppose you were to assign 
A = 0 to observations in the second period and A = 1 to observations in the first 
period. How would the results shown in Eq. (9.5.4) change? 

9.21. Use the data given in Table 9.2 and consider the following model: 

In Savings; = f}\ + A In Income; + A In A + u i 
where In stands for natural log and where A = 1 for 1970-1981 and 10 for 
1982-1995. 

a. What is the rationale behind assigning dummy values as suggested? 

b. Estimate the preceding model and interpret your results. 

c. What are the intercept values of the savings function in the two subperiods and 
how do you interpret them? 

9.22. Refer to the quarterly appliance sales data given in Table 9.3. Consider the following 
model: 

Sales, = a\ + a 2 D 2i + a 3 D 3l + a 4 D 4i + u t 
where the D ’s are dummies taking 1 and 0 values for quarters II through IV 

a. Estimate the preceding model for dishwashers, disposers, and washing machines 
individually. 

b. How would you interpret the estimated slope coefficients? 

c. How would you use the estimated a’s to deseasonalize the sales data for individ¬ 
ual appliances? 


‘Chandan Mukherjee, Howard White, and Marc Wuyts, Econometrics and Data Analysis for Developing 
Countries, Routledge, London, 1998, pp. 372-375. Notations adapted. 
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TABLE 9.9 
U.S. Presidential 
Elections, 1916-2004 


Obs. Year 


V W D 


C 


N P 


1 1916 0.5168 

2 1920 0.3612 

3 1924 0.4176 

4 1928 0.4118 

5 1932 0.5916 

6 1936 0.6246 

7 1940 0.55 

8 1944 0.5377 

9 1948 0.5237 

10 1952 0.446 

11 1956 0.4224 

12 1960 0.5009 

13 1964 0.6134 

14 1968 0.496 

15 1972 0.3821 

16 1976 0.5105 

17 1980 0.447 

18 1984 0.4083 

19 1988 0.461 

20 1992 0.5345 

21 1996 0.5474 

22 2000 0.50265 

23 2004 0.51233 


0 1 2.229 

1 0 -11.46 

0 -1 -3.872 

0 0 4.623 

0 -1 -14.9 

0 1 11.921 

0 1 3.708 

1 1 4.119 

1 1 1.849 

0 0 0.627 

0 -1 -1.527 

0 0 0.114 

0 1 5.054 

0 0 4.836 

0 -1 6.278 

0 0 3.663 

0 1 -3.789 

0 -1 5.387 

0 0 2.068 

0 -1 2.293 

0 1 2.918 

0 0 1.219 

0 1 2.69 


1 3 4.252 

1 5 16.535 

-1 10 5.161 

-1 7 0.183 

-1 4 7.069 

1 9 2.362 

1 8 0.028 

1 14 5.678 

1 5 8.722 

1 6 2.288 

-1 5 1.936 

-1 5 1.932 

1 10 1.247 

1 7 3.215 

-1 4 4.766 

-1 4 7.657 

1 5 8.093 

-1 7 5.403 

-1 6 3.272 

-1 1 3.692 

1 3 2.268 

1 8 1.605 

-1 1 2.325 


Year Election year 

V Incumbent share of the two-party presidential vote. 

W Indicator variable (1 for the elections of 1920, 1944, and 1948, and 0 otherwise), 
otherwise). 

G Growth rate of real per capita GDP in the first three quarters of the election year. 

N Number of quarters in the first 15 quarters of the administration in which the growth rate of real per capita GDP is greater than 3.2%. 
P Absolute value of the growth rate of the GDP deflator in the first 15 quarters of the administration. 


9.23. Reestimate the model in Exercise 9.22 by adding the regressor, expenditure on 
durable goods. 

a. Is there a difference in the regression results you obtained in Exercise 9.22 and in 
this exercise? If so, what explains the difference? 

b. If there is seasonality in the durable goods expenditure data, how would you 
account for it? 

9.24. Table 9.9 gives data on quadrennial presidential elections in the United States from 
1916 to 2004.* 

a. Using the data given in Table 9.9, develop a suitable model to predict the 
Democratic share of the two-party presidential vote. 

b. How would you use this model to predict the outcome of a presidential election? 


These data were originally compiled by Ray Fair of Yale University, who has been predicting the out¬ 
come of presidential elections for several years. The data are reproduced from Samprit Chatterjee, Ali 
S. Hadi, and Bertram Price, Regression Analysis by Example, 3d ed., John Wiley & Sons, New York, 
2000, pp. 150-151 and updated from http://fairmodel.econ.yale.edu/rayfair/pdf/2006CHTM.HTM. 
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c. Chatterjee et al. suggested considering the following model as a trial model to pre¬ 
dict presidential elections: 

V = p 0 + frl + p 2 D + p 3 W + p 4 (GI) + p 5 P + p 6 N + u 

Estimate this model and comment on the results in relation to the results of the model 
you have chosen. 

9.25. Refer to regression (9.6.4). Test the hypothesis that the rate of increase of average 
hourly earnings with respect to education differs by gender and race. (Hint: Use mul¬ 
tiplicative dummies.) 

9.26. Refer to the regression (9.3.1). How would you modify the model to find out if there 
is any interaction between the gender and the region of residence dummies? Present 
the results based on this model and compare them with those given in Eq. (9.3.1). 

9.27. In the model Y t = P\ + p 2 D t + u i , let A = 0 for the first 40 observations and A = 1 
for the remaining 60 observations. You are told that u, has zero mean and a variance of 
100. What are the mean values and variances of the two sets of observations?* 

9.28. Refer to the U.S. savings-income regression discussed in the chapter. As an 
alternative to Eq. (9.5.1), consider the following model: 

In Y, = Pi + p 2 D t + p 3 X t + p 4 (D t X t ) + u, 
where Y is savings and X is income. 

a. Estimate the preceding model and compare the results with those given in 
Eq. (9.5.4). Which is a better model? 

b. How would you interpret the dummy coefficient in this model? 

c. As we will see in the chapter on heteroscedasticity, very often a log transforma¬ 
tion of the dependent variable reduces heteroscedasticity in the data. See if this 
is the case in the present example by running the regression of log of Y on X for 
the two periods and see if the estimated error variances in the two periods are sta¬ 
tistically the same. If they are, the Chow test can be used to pool the data in the 
manner indicated in the chapter. 

9.29. Refer to the Indian wage earners example (Section 9.12) and the data in Table 9.77 
As a reminder, the variables are defined as follows: 

WI = weekly wage income in rupees 
Age = age in years 

A cx = 1 for male workers and 0 for female workers 

DE 2 — a dummy variable taking a value of 1 for workers with up to a primary 
education 

DE 3 — a dummy variable taking a value of 1 for workers with up to a secondary 
education 

DE 4 — a dummy variable taking a value of 1 for workers with higher education 
DPT — a dummy variable taking a value of 1 for workers with permanent jobs and a 
value of 0 for temporary workers 

The reference category is male workers with no primary education and temporary jobs. 


'This example is adapted from Peter Kennedy, A Guide to Econometrics, 4th ed., MIT Press, 
Cambridge, Mass., 1998, p. 347. 

Hhe data come from Econometrics and Data Analysis for Developing Countries, by Chandan 
Mukherjee, Howard White, and Marc Wuyts, Routledge Press, London, 1998, in the Appendix. 
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In Section 9.12, interaction terms were created between the education variables 
(DE 2 , DE 3 , and DE 4 ) and the gender variable (D scx ). What happens if we create in¬ 
teraction terms between the education dummies and the permanent worker dummy 
variable (DPT)? 

a. Estimate the model predicting In WI containing age, gender, the education 
dummy variables, and three new interaction terms: DE 2 x DPT, DE 3 x DPT, and 
DE 4 x DPT. Does there appear to be a significant interaction effect among the 
new terms? 

b. Is there a significant difference between workers with an education level up to pri¬ 
mary and those without a primary education? Assess this with respect to both the 
education dummy variable and the interaction term and explain the results. What 
about the difference between workers with a secondary level of education and 
those without a primary level of education? What about the difference between 
those with an education level beyond secondary, compared to those without a pri¬ 
mary level of education? 

c. Now assess the results of deleting the education dummies from the model. Do the 
interaction terms change in significance? 


Appendix 9A 


Semilogarithmic Regression with Dummy Regressor 

In Section 9.10 we noted that in models of the type 

In Y t =Pi+P2Di (1) 

the relative change in Y (i.e., semielasticity), with respect to the dummy regressor taking values of 1 
or 0, can be obtained as (antilog of estimated fij) — 1 times 100, that is, as 

(J 1 - 1) x 100 (2) 

The proof is as follows: Since In and exp (= e) are inverse functions, we can write Eq. (1) as: 

In Yj — fi\ +He f>2Di ) (3) 

Now when D — 0, e^ lD ‘ — 1 and when D = 1, e^ lDi = e^ 1 . Therefore, in going from state 0 to state 
1, In Y t changes by (e^ 2 — 1). But a change in the log of a variable is a relative change, which after 
multiplication by 100 becomes a percentage change. Hence the percentage change is 
(e^ 2 — 1) x 100, as claimed. (Note: ln e e = 1, that is, the log of e to base e is 1, just as the log of 10 
to base 10 is 1. Recall that log to base e is called the natural log and that log to base 10 is called the 
common log.) 



Relaxing 

the Assumptions of 
the Classical Model 



In Part 1 we considered at length the classical normal linear regression model and showed 
how it can be used to handle the twin problems of statistical inference, namely, estimation 
and hypothesis testing, as well as the problem of prediction. But recall that this model is 
based on several simplifying assumptions, which are as follows. 


Assumption 1. 
Assumption 2. 

Assumption 3. 
Assumption 4. 
Assumption 5. 

Assumption 6. 

Assumption 7. 


The regression model is linear in the parameters. 

The values of the regressors, the X’s, are fixed, or X values are 
independent of the error term. Here, this means we require zero 
covariance between w, and each A variable. 

For given 27s, the mean value of disturbance u, is zero. 

For given X’s, the variance of u, is constant or homoscedastic. 

For given X’s, there is no autocorrelation, or serial correlation, 
between the disturbances. 

The number of observations n must be greater than the number of 
parameters to be estimated. 

There must be sufficient variation in the values of the X variables. 


We are also including the following 3 assumptions in this part of the text: 


Assumption 8. There is no exact collinearity between the X variables. 
Assumption 9. The model is correctly specified, so there is no specification bias. 
Assumption 10. The stochastic (disturbance) term u, is normally distributed. 


Before proceeding further, let us note that most textbooks list fewer than 10 assumptions. 
For example, assumptions 6 and 7 are taken for granted rather than spelled out explicitly. We 
decided to state them explicitly because distinguishing between the assumptions required 
for ordinary least squares (OLS) to have desirable statistical properties (such as BLUE) and 
the conditions required for OLS to be useful seems sensible. For example, OLS estimators 
are BLUE (best linear unbiased estimators) even if assumption 7 is not satisfied. But in that 
case the standard errors of the OLS estimators will be large relative to their coefficients 
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(i.e., the t ratios will be small), thereby making it difficult to assess the contribution of one 
or more regressors to the explained sum of squares. 

As Wetherill notes, in practice two major types of problems arise in applying the classi¬ 
cal linear regression model: (1) those due to assumptions about the specification of the 
model and about the disturbances u, and (2) those due to assumptions about the data. 1 In the 
first category are Assumptions 1, 2, 3,4, 5, 9, and 10. Those in the second category include 
Assumptions 6, 7, and 8. In addition, data problems, such as outliers (unusual or untypical 
observations) and errors of measurement in the data, also fall into the second category. 

With respect to problems arising from the assumptions about disturbances and model spec¬ 
ifications, three major questions arise: (1) How severe must the departure be from a particular 
assumption before it really matters? For example, if u, are not exactly normally distributed, 
what level of departure from this assumption can one accept before the BLUE property of the 
OLS estimators is destroyed? (2) How do we find out whether a particular assumption is in fact 
violated in a concrete case? Thus, how does one find out if the disturbances are normally 
distributed in a given application? We have already discussed the Anderson-Darling 
A 2 statistic and Jarque-Bera tests of normality. (3) What remedial measures can we take if 
one or more of the assumptions are false? For example, if the assumption of homoscedasticity 
is found to be false in an application, what do we do then? 

With regard to problems attributable to assumptions about the data, we also face similar 
questions. (1) How serious is a particular problem? For example, is multicollinearity so 
severe that it makes estimation and inference very difficult? (2) How do we find out the 
severity of the data problem? For example, how do we decide whether the inclusion or 
exclusion of an observation or observations that may represent outliers will make a 
tremendous difference in the analysis? (3) Can some of the data problems be easily reme¬ 
died? For example, can one have access to the original data to find out the sources of errors 
of measurement in the data? 

Unfortunately, satisfactory answers cannot be given to all these questions. In the rest of 
Part 2 we will look at some of the assumptions more critically, but not all will receive full 
scrutiny. In particular, we will not discuss in depth the following: Assumptions 2,3, and 10. 
The reasons are as follows: 

Assumption 2: Fixed versus Stochastic Regressors 

Remember that our regression analysis is based on the assumption that the regressors are 
nonstochastic and assume fixed values in repeated sampling. There is a good reason for this 
strategy. Unlike scientists in the physical sciences, as noted in Chapter 1, economists gener¬ 
ally have no control over the data they use. More often than not, economists depend on sec¬ 
ondary data, that is, data collected by someone else, such as the government and private 
organizations. Therefore, the practical strategy to follow is to assume that for the problem at 
hand the values of the explanatory variables are given even though the variables themselves 
may be intrinsically stochastic or random. Hence, the results of the regression analysis are 
conditional upon these given values. 

But suppose that we cannot regard the Ws as truly nonstochastic or fixed. This is the 
case of random or stochastic regressors. Now the situation is rather involved. The u„ by 


1 C. Barrie Wetherill, Regression Analysis with Applications, Chapman and Hall, New York, 1986, 
pp. 14-15. 
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assumption, are stochastic. If the X’s too are stochastic, then we must specify how the X’s 
and u, are distributed. If we are willing to make Assumption 2 (i.e., the X’s, although ran¬ 
dom, are distributed independently of, or at least uncorrelated with, «,■), then for all practi¬ 
cal purposes we can continue to operate as if the X’s were nonstochastic. As Kmenta notes: 

Thus, relaxing the assumption that X is nonstochastic and replacing it by the assumption that 
X is stochastic but independent of[u\ does not change the desirable properties and feasibility 
of least squares estimation. 2 

Therefore, we will retain Assumption 2 until we come to deal with simultaneous equa¬ 
tions models in Part 4. 3 Also, a brief discussion of nonstochastic regressors will be given in 
Chapter 13. 

Assumption 3: Zero Mean Value of u, 

Recall the &-variable linear regression model: 

Y, = P\ + p 2 X 2i + ftX 3i + ■ ■ • + foX ki + Ui ( 1 ) 

Let us now assume that 

E( Ui \X 2i ,X 3i ,...,X ki )=w ( 2 ) 

where w is a constant; note in the standard model w = 0, but now we let it be any constant. 
Taking the conditional expectation of Eq.(l), we obtain 

E(Yi \X 2i , X 3 „ ..., X U ) = fa + hX 2i + ftX 3; + ■ ■ ■ + foXu + w 

= (Ji\ + w) + P 2 X 2i + fy,X 3; + ■ ■ • + foX u ( 3 ) 

— a + p 2 X 2 1 + ftX 3i -|-+ p k X u 

where a — (f\ + w) and where in taking the expectations one should note that the X’s are 
treated as constants. (Why?) 

Therefore, if Assumption 3 is not fulfilled, we see that we cannot estimate the original 
intercept f\ ; what we obtain is a, which contains f\ and E(ui) = w. In short, we obtain a 
biased estimate of . 

But as we have noted on many occasions, in many practical situations the intercept term, 
fy, is of little importance; the more meaningful quantities are the slope coefficients, which 
remain unaffected even if Assumption 3 is violated. 4 Besides, in many applications the 
intercept term has no physical interpretation. 


2 Jan Kmenta, Elements of Econometrics, 2d ed., Macmillan, New York, 1986, p. 338. (Emphasis in the 
original.) 

3 A technical point may be noted here. Instead of the strong assumption that the X's and u are inde¬ 
pendent, we may use the weaker assumption that the values of X variables and u are uncorrelated 
contemporaneously (i.e., at the same point in time). In this case OLS estimators may be biased but 
they are consistent, that is, as the sample size increases indefinitely, the estimators converge on 
their true values. If, however, the X's and u are contemporaneously correlated, the OLS estimators 
are biased as well as inconsistent. In Chapter 1 7 we will show how the method of instrumental 
variables can sometimes be used to obtain consistent estimators in this situation. 

4 lt is very important to note that this statement is true only if £(u,) = wfor each /. However, if E(uj) = w„ 
that is, a different constant for each /, the partial slope coefficients may be biased as well as inconsis¬ 
tent. In this case violation of Assumption 3 will be critical. For proof and further details, see Peter 
Schmidt, Econometrics, Marcel Dekker, New York, 1976, pp. 36-39. 
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Assumption 10: Normality of u 

This assumption is not essential if our objective is estimation only. As noted in Chapter 3, 
the OLS estimators are BLUE regardless of whether the u, are normally distributed or not. 
With the normality assumption, however, we were able to establish that the OLS estimators 
of the regression coefficients follow the normal distribution, that (n — k)o 2 /o 2 has the y 2 
distribution, and that one could use the t and F tests to test various statistical hypotheses re¬ 
gardless of the sample size. 

But what happens if the u, are not normally distributed? We then rely on the following 
extension of the central limit theorem; recall that it was the central limit theorem we in¬ 
voked to justify the normality assumption in the first place: 

If the disturbances [«,] are independently and identically distributed with zero mean and 
[constant] variance a 2 and if the explanatory variables are constant in repeated samples, the 
[0]LS coefficient estimators are asymptotically normally distributed with means equal to the 
corresponding /J’s. 5 

Therefore, the usual test procedures—the t and F tests—are still valid asymptotically, 
that is, in the large sample, but not in the finite or small samples. 

The fact that if the disturbances are not normally distributed the OLS estimators are still 
normally distributed asymptotically (under the assumption of homoscedastic variance and 
fixed W’s) is of little comfort to practicing economists, who often do not have the luxury of 
large-sample data. Therefore, the normality assumption becomes extremely important for 
the purposes of hypothesis testing and prediction. Hence, with the twin problems of estima¬ 
tion and hypothesis testing in mind, and given the fact that small samples are the rule rather 
than the exception in most economic analyses, we shall continue to use the normality 
assumption. 6 (But see Chapter 13, Section 13.12.) 

Of course, this means that when we deal with a finite sample, we must explicitly test for 
the normality assumption. We have already considered the Anderson-Darling and the 
Jarque-Bera tests of normality. The reader is strongly urged to apply these or other tests 
of normality to regression residuals. Keep in mind that in finite samples without the nor¬ 
mality assumption the usual t and F statistics may not follow the t and F distributions. 

We are left with Assumptions 1, 4, 5, 6, 7, 8, and 9. Assumptions 6, 7, and 8 are closely 
related and are discussed in the chapter on multicollinearity (Chapter 10). Assumption 4 is 
discussed in the chapter on heteroscedasticity (Chapter 11). Assumption 5 is discussed in 
the chapter on autocorrelation (Chapter 12). Assumption 9 is discussed in the chapter 
on model specification and diagnostic testing (Chapter 13). Because of its specialized 
nature and mathematical demands, Assumption 1 is discussed as a special topic in Part 3 
(Chapter 14). 

For pedagogical reasons, in each of these chapters we follow a common format, namely, 
(1) identify the nature of the problem, (2) examine its consequences, (3) suggest methods 
of detecting it, and (4) consider remedial measures so that they may lead to estimators that 
possess the desirable statistical properties discussed in Part 1. 


s Henri Theil, Introduction to Econometrics, Prentice-Hall, Englewood Cliffs, NJ, 1978, p. 240. It must be 
noted the assumptions of fixed X's and constant a 2 are crucial for this result. 

6 ln passing, note that the effects of departure from normality and related topics are often discussed 
under the topic of robust estimation in the literature, a topic beyond the scope of this book. 
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A cautionary note is in order: As noted earlier, satisfactory answers to all the problems 
arising out of the violation of the assumptions of the classical linear regression model 
(CLRM) do not exist. Moreover, there may be more than one solution to a particular prob¬ 
lem, and often it is not clear which method is best. Besides, in a particular application more 
than one violation of the CLRM may be involved. Thus, specification bias, multicollinear- 
ity, and heteroscedasticity may coexist in an application, and there is no single omnipotent 
test that will solve all the problems simultaneously. 7 Furthermore, a particular test that was 
popular at one time may not be in vogue later because somebody found a flaw in the earlier 
test. But this is how science progresses. Econometrics is no exception. 


7 This is not for lack of trying. See A. K. Bera and C. M. Jarque, "Efficient Tests for Normality, 
Homoscedasticity and Serial Independence of Regression Residuals: Monte Carlo Evidence," 
Economic Letters, vol. 7, 1981, pp. 313-318. 


Chapter 


Multicollinearity: 
What Happens 
If the Regressors 
Are Correlated? 


There is no pair of words that is more misused both in econometrics texts and in the applied 
literature than the pair “multi-collinearity problem.” That many of our explanatory variables are 
highly collinear is a fact of life. And it is completely clear that there are experimental designs 
X'X [i.e., data matrix] which would be much preferred to the designs the natural experiment has 
provided us [i.e., the sample at hand]. But a complaint about the apparent malevolence of nature 
is not at all constructive, and the ad hoc cures for a bad design, such as stepwise regression or 
ridge regression, can be disastrously inappropriate. Better that we should rightly accept the fact 
that our non-experiments [i.e., data not collected by designed experiments] are sometimes not 
very informative about parameters of interest. 1 

Assumption 8 of the classical linear regression model (CLRM) is that there is no 
multicollinearity among the regressors included in the regression model. In this chapter 
we take a critical look at this assumption by seeking answers to the following questions: 

1. What is the nature of multicollinearity? 

2. Is multicollinearity really a problem? 

3. What are its practical consequences? 

4. How does one detect it? 

5. What remedial measures can be taken to alleviate the problem of multicollinearity? 

In this chapter we also discuss Assumption 6 of the CLRM, namely, that the number of 
observations in the sample must be greater than the number of regressors, and Assumption 7, 
which requires that there be sufficient variability in the values of the regressors, for they are 
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'Edward E. Learner, "Model Choice and Specification Analysis," in Zvi Criliches and Michael D. Intrili- 
gator, eds., Handbook of Econometrics, vol. I, North Holland Publishing Company, Amsterdam, 1983, 
pp. 300-301. 
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intimately related to the assumption of no multicollinearity. Arthur Goldberger has chris¬ 
tened Assumption 6 as the problem of micronumerosity , 2 which simply means small sam¬ 
ple size. 

10.1 The Nature of Multicollinearity 

The term multicollinearity is due to Ragnar Frisch. 3 Originally it meant the existence of a 
“perfect,” or exact, linear relationship among some or all explanatory variables of a regres¬ 
sion model. 4 For the variable regression involving explanatory variables X\, X 2 ,..., X k 
(where X\ — 1 for all observations to allow for the intercept term), an exact linear rela¬ 
tionship is said to exist if the following condition is satisfied: 

fXi+k 2 X 2 + --- + k k X k = 0 ( 10 . 1 . 1 ) 

where k\,k 2 ,..., k k are constants such that not all of them are zero simultaneously. 5 

Today, however, the term multicollinearity is used in a broader sense to include the case 
of perfect multicollinearity, as shown by Eq. (10.1.1), as well as the case where the X vari¬ 
ables are intercorrelated but not perfectly so, as follows: 6 

k\X\ +k 2 X 2 + --- + k 2 X k + Vi = 0 ( 10 . 1 . 2 ) 

where v,- is a stochastic error term. 

To see the difference between perfect and less than perfect multicollinearity, assume, for 
example, that k 2 f 0. Then, Eq. (10.1.1) can be written as 

Xv = - y<Xu - - \ k X ki ( 10 . 1 . 3 ) 

A2 A2 A2 

which shows how X 2 is exactly linearly related to other variables or how it can be derived 
from a linear combination of other X variables. In this situation, the coefficient of correla¬ 
tion between the variable X 2 and the linear combination on the right side of Eq. (10.1.3) is 
bound to be unity. 

Similarly, if k 2 / 0, Eq. (10.1.2) can be written as 

X 2i = Xli ~^X 2i -^ X U - ~ v, (10.1.4) 

k 2 x 2 a 2 k 2 

which shows that X 2 is not an exact linear combination of other X’s because it is also 
determined by the stochastic error term v ; . 


2 See his A Course in Econometrics, Harvard University Press, Cambridge, Mass., 1991, p. 249. 

3 Ragnar Frisch, Statistical Confluence Analysis by Means of Complete Regression Systems, Institute of 
Economics, Oslo University, publ. no. 5, 1934. 

4 Strictly speaking, multicollinearity refers to the existence of more than one exact linear relationship, 
and collinearity refers to the existence of a single linear relationship. But this distinction is rarely 
maintained in practice, and multicollinearity refers to both cases. 

s The chances of one's obtaining a sample of values where the regressors are related in this fashion are 
indeed very small in practice except by design when, for example, the number of observations is 
smaller than the number of regressors or if one falls into the "dummy variable trap" as discussed in 
Chapter 9. See Exercise 10.2. 

6 lf there are only two explanatory variables, intercorrelation can be measured by the zero-order or 
simple correlation coefficient. But if there are more than two X variables, intercorrelation can be 
measured by the partial correlation coefficients or by the multiple correlation coefficient R of one 
X variable with all other X variables taken together. 
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As a numerical example, consider the following hypothetical data: 


X 2 

*3 

*3 

10 

50 

52 

15 

75 

75 

18 

90 

97 

24 

120 

129 

30 

150 

152 


It is apparent that X 3; = 5X 2 ;. Therefore, there is perfect collinearity between X 2 and X 3 
since the coefficient of correlation r 22 , is unity. The variable X* was created from X 3 by sim¬ 
ply adding to it the following numbers, which were taken from a table of random numbers: 
2, 0, 7, 9, 2. Now there is no longer perfect collinearity between A) and X*. However, the 
two variables are highly correlated because calculations will show that the coefficient of 
correlation between them is 0.9959. 

The preceding algebraic approach to multicollinearity can be portrayed succinctly by 
the Ballentine (recall Figure 3.8, reproduced in Figure 10.1). In this figure the circles Y, X 2 , 
and A 3 represent, respectively, the variations in Y (the dependent variable) and X 2 and X 3 
(the explanatory variables). The degree of collinearity can be measured by the extent of the 
overlap (shaded area) of the A) and X 3 circles. In Figure 10.1a there is no overlap between 
X 2 and X 3 , and hence no collinearity. In Figure 10 .lb through 10.le there is a “low” to 
“high” degree of collinearity—the greater the overlap between^ andX 3 (i.e., the larger the 


FIGURE 10.1 

The Ballentine view 
of multicollinearity. 
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shaded area), the higher the degree of collinearity. In the extreme, ifX 2 andX 3 were to over¬ 
lap completely (or if X 2 were completely inside X 3 , or vice versa), collinearity would be 
perfect. 

In passing, note that multicollinearity, as we have defined it, refers only to linear rela¬ 
tionships among the X variables. It does not rule out nonlinear relationships among them. 
For example, consider the following regression model: 

Yi=0 o + Pi X t + p 2 Xf + p 3 X\ +Ui (10.1.5) 

where, say, Y = total cost of production and X = output. The variables Xj (output squared) 
and X] (output cubed) are obviously functionally related to X t , but the relationship is non¬ 
linear. Strictly, therefore, models such as Eq. (10.1.5) do not violate the assumption of no 
multicollinearity. However, in concrete applications, the conventionally measured correla¬ 
tion coefficient will show X,, Xf, and Xf to be highly correlated, which, as we shall show, 
will make it difficult to estimate the parameters of Eq. (10.1.5) with greater precision (i.e., 
with smaller standard errors). 

Why does the classical linear regression model assume that there is no multicollinearity 
among the X’sl The reasoning is this: If multicollinearity is perfect in the sense of 
Eq. (10.1.1), the regression coefficients of the X variables are indeterminate and their 
standard errors are infinite. If multicollinearity is less than perfect, as in Eq. (10.1.2), 
the regression coefficients, although determinate, possess large standard errors (in re¬ 
lation to the coefficients themselves), which means the coefficients cannot be estimated 
with great precision or accuracy. The proofs of these statements are given in the follow¬ 
ing sections. 

There are several sources of multicollinearity. As Montgomery and Peck note, multi¬ 
collinearity may be due to the following factors: 7 

1. The data collection method employed. For example, sampling over a limited range of 
the values taken by the regressors in the population. 

2. Constraints on the model or in the population being sampled. For example, in the 
regression of electricity consumption on income (X 2 ) and house size (X 3 ) there is a physi¬ 
cal constraint in the population in that families with higher incomes generally have larger 
homes than families with lower incomes. 

3. Model specification. For example, adding polynomial terms to a regression model, 
especially when the range of the X variable is small. 

4. An overdetermined model. This happens when the model has more explanatory vari¬ 
ables than the number of observations. This could happen in medical research where there 
may be a small number of patients about whom information is collected on a large number 
of variables. 

An additional reason for multicollinearity, especially in time series data, may be that the 
regressors included in the model share a common trend, that is, they all increase or decrease 
over time. Thus, in the regression of consumption expenditure on income, wealth, and pop¬ 
ulation, the regressors income, wealth, and population may all be growing over time at more 
or less the same rate, leading to collinearity among these variables. 

7 Douglas Montgomery and Elizabeth Peck, Introduction to Linear Regression Analysis, John Wiley St 
Sons, New York, 1982, pp. 289-290. See also R. L. Mason, R. F. Cunst, and J. T. Webster, "Regression 
Analysis and Problems of Multicollinearity," Communications in Statistics A, vol. 4, no. 3, 1975, 
pp. 277-292; R. F. Cunst, and R. L. Mason, "Advantages of Examining Multicollinearities in Regression 
Analysis," Biometrics, vol. 33, 1977, pp. 249-260. 
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10.2 Estimation in the Presence of Perfect Multicollinearity 

It was stated previously that in the case of perfect multicollinearity the regression coeffi¬ 
cients remain indeterminate and their standard errors are infinite. This fact can be demon¬ 
strated readily in terms of the three-variable regression model. Using the deviation form, 
where all the variables are expressed as deviations from their sample means, we can write 
the three-variable regression model as 

yt = fcxii + /W + Ui (10.2.1) 

Now from Chapter 7 we obtain 


(Ey t x 2,)(E4) - 

(7.4.7) 

(E4)C4)-(EW 

(Ej i « i )(E4) - (Ej.-*2,)(E«,«<) 

(7.4.8) 

(E4KE4) - (Ew») ! 


Assume that X 3i = fX 2 i, where A. is a nonzero constant (e.g., 2, 4, 1.8, etc.). Substituting 
this into Eq. (7.4.7), we obtain 


■ (Ew)(VE4)-PE.to)(*E4) 

(E4)( 12 E4) - v (E4) 2 (10 . 2 . 2) 

_ 0 
“ 0 

which is an indeterminate expression. The reader can verify that /S3 is also indeterminate. 8 

Why do we obtain the result shown in Eq. (10.2.2)? Recall the meaning of f 2 \ It gives 
the rate of change in the average value of Y as X 2 changes by a unit, holding X 2 constant. 
But if X 3 and X 2 are perfectly collinear, there is no way X 2 can be kept constant: As X 2 
changes, so does X 2 by the factor A. What it means, then, is that there is no way of disen¬ 
tangling the separate influences of X 2 and X 2 from the given sample: For practical purposes 
X 2 and Xt, are indistinguishable. In applied econometrics this problem is most damaging 
since the entire intent is to separate the partial effects of each X upon the dependent 
variable. 

To see this differently, let us substitute X 2i = 'kX 2i into Eq. (10.2.1) and obtain the 
following [see also Eq. (7.1.12)]: 

Yt = PlX2i + ft(Ax 2 ;) + Ui 

= (/§ 2 + Xp 3 )x 2i + Ui (10.2.3) 

= ax 2i + u t 

where 

« = 0§2 + A/3 3 ) (10.2.4) 


8 Another way of seeing this is as follows: By definition, the coefficient of correlation between X 2 and X3, 
r 2 3, is J2 x 2 i Xu/J^2 4 £ x| i. If r| 3 = 1, i.e., perfect collinearity between X 2 and X 3/ the denominator of 
Eq. (7.4.7) will be zero, making estimation of fl 2 (or of ft) impossible. 
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Applying the usual OLS formula to Eq. (10.2.3), we get 

a=(ft + Aft)^£^ (10.2.5) 

E4 

Therefore, although we can estimate a uniquely, there is no way to estimate ft and ft uniquely; 
mathematically 

a=ft+Aft (10.2.6) 

gives us only one equation in two unknowns (note A is given) and there is an infinity of 
solutions to Eq. (10.2.6) for given values of a and X. To put this idea in concrete terms, let 
a = 0.8 and X = 2. Then we have 

0.8 = ft + 2ft (10.2.7) 

or 

ft = 0.8 - 2 ft ( 10 . 2 . 8 ) 

Now choose a value of ft arbitrarily, and we will have a solution for ft. Choose another 
value for ft, and we will have another solution for ft - No matter how hard we try, there is 
no unique value for ft. 

The upshot of the preceding discussion is that in the case of perfect multicollinearity one 
cannot get a unique solution for the individual regression coefficients. But notice that one 
can get a unique solution for linear combinations of these coefficients. The linear combi¬ 
nation (ft + Aft) is uniquely estimated by a, given the value of A. 9 

In passing, note that in the case of perfect multicollinearity the variances and standard 
errors of ft and ft individually are infinite. (See Exercise 10.21.) 

10.3 Estimation in the Presence of “High” 
but “Imperfect” Multicollinearity 

The perfect multicollinearity situation is a pathological extreme. Generally, there is no 
exact linear relationship among the X variables, especially in data involving economic time 
series. Thus, turning to the three-variable model in the deviation form given in Eq. (10.2.1), 
instead of exact multicollinearity, we may have 

x 3i =Xx 2i +v i (10.3.1) 

where A ^ 0 and where v, is a stochastic error term such that *2; v; = 0. (Why?) 

Incidentally, the Ballentines shown in Figure 10. lb to 10. le represent cases of imperfect 
collinearity. 

In this case, estimation of regression coefficients ft and ft may be possible. For exam¬ 
ple, substituting Eq. (10.3.1) into Eq. (7.4.7), we obtain 

■ gwdfc + sVafelM (1032) 

where use is made of *2<V/ — 0. A similar expression can be derived for ft. 


9 ln econometric literature, a function such as (ft + Aft) is known as an estimable function. 
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Now, unlike Eq. (10.2.2), there is no reason to believe a priori that Eq. (10.3.2) cannot 
be estimated. Of course, if v, is sufficiently small, say, very close to zero, Eq. (10.3.1) will 
indicate almost perfect collinearity and we shall be back to the indeterminate case of 
Eq. (10.2.2). 


10.4 Multicollinearity: Much Ado about Nothing? 

Theoretical Consequences of Multicollinearity 

Recall that if the assumptions of the classical model are satisfied, the OLS estimators of the 
regression estimators are BLUE (or BUE, if the normality assumption is added). Now it 
can be shown that even if multicollinearity is very high, as in the case of near multi¬ 
collinearity, the OLS estimators still retain the property of BLUE. 10 Then what is the mul¬ 
ticollinearity fuss all about? As Christopher Achen remarks (note also the Learner quote at 
the beginning of this chapter): 

Beginning students of methodology occasionally worry that their independent variables are 
correlated—the so-called multicollinearity problem. But multicollinearity violates no regres¬ 
sion assumptions. Unbiased, consistent estimates will occur, and their standard errors will be 
correctly estimated. The only effect of multicollinearity is to make it hard to get coefficient 
estimates with small standard error. But having a small number of observations also has that 
effect, as does having independent variables with small variances. (In fact, at a theoretical level, 
multicollinearity, few observations and small variances on the independent variables are essen¬ 
tially all the same problem.) Thus “What should I do about multicollinearity?” is a question like 
“What should I do if I don’t have many observations?” No statistical answer can be given. 11 

To drive home the importance of sample size, Goldberger coined the term 
micronumerosity, to counter the exotic polysyllabic name multicollinearity. According to 
Goldberger, exact micronumerosity (the counterpart of exact multicollinearity) arises 
when n, the sample size, is zero, in which case any kind of estimation is impossible. Near 
micronumerosity, like near multicollinearity, arises when the number of observations barely 
exceeds the number of parameters to be estimated. 

Learner, Achen, and Goldberger are right in bemoaning the lack of attention given to the 
sample size problem and the undue attention to the multicollinearity problem. Unfortu¬ 
nately, in applied work involving secondary data (i.e., data collected by some agency, such 
as the GNP data collected by the government), an individual researcher may not be able to 
do much about the size of the sample data and may have to face “estimating problems 
important enough to warrant our treating it [i.e., multicollinearity] as a violation of the 
CLR [classical linear regression] model.” 12 

First, it is true that even in the case of near multicollinearity the OLS estimators are un¬ 
biased. But unbiasedness is a multisample or repeated sampling property. What it means is 
that, keeping the values of the X variables fixed, if one obtains repeated samples and com¬ 
putes the OLS estimators for each of these samples, the average of the sample values will 
converge to the true population values of the estimators as the number of samples increases. 
But this says nothing about the properties of estimators in any given sample. 

10 Since near multicollinearity per se does not violate the other assumptions listed in Chapter 7, the 
OLS estimators are BLUE as indicated there. 

"Christopher H. Achen, Interpreting and Using Regression, Sage Publications, Beverly Hills, Calif., 

1982, pp. 82-83. 

"Peter Kennedy, A Guide to Econometrics, 3d ed., The MIT Press, Cambridge, Mass., 1992, p. 1 77. 
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Second, it is also true that collinearity does not destroy the property of minimum vari¬ 
ance: In the class of all linear unbiased estimators, the OLS estimators have minimum vari¬ 
ance; that is, they are efficient. But this does not mean that the variance of an OLS estimator 
will necessarily be small (in relation to the value of the estimator) in any given sample, as 
we shall demonstrate shortly. 

Third, multicollinearity is essentially a sample (regression) phenomenon in the sense 
that, even if the X variables are not linearly related in the population, they may be so related 
in the particular sample at hand: When we postulate the theoretical or population regression 
function (PRF), we believe that all the X variables included in the model have a separate or 
independent influence on the dependent variable Y. But it may happen that in any given 
sample that is used to test the PRF some or all of the X variables are so highly collinear that 
we cannot isolate their individual influence on Y. So to speak, our sample lets us down, 
although the theory says that all the X’s are important. In short, our sample may not be 
“rich” enough to accommodate all X variables in the analysis. 

As an illustration, reconsider the consumption-income example of Chapter 3 (Exam¬ 
ple 3.1). Economists theorize that, besides income, the wealth of the consumer is also an 
important determinant of consumption expenditure. Thus, we may write 

Consumption, = ft + ft Income, + ft Wealth, + u, 

Now it may happen that when we obtain data on income and wealth, the two variables may 
be highly, if not perfectly, correlated: Wealthier people generally tend to have higher in¬ 
comes. Thus, although in theory income and wealth are logical candidates to explain the 
behavior of consumption expenditure, in practice (i.e., in the sample) it may be difficult to 
disentangle the separate influences of income and wealth on consumption expenditure. 

Ideally, to assess the individual effects of wealth and income on consumption expendi¬ 
ture we need a sufficient number of sample observations of wealthy individuals with low 
income, and high-income individuals with low wealth (recall Assumption 7). Although this 
may be possible in cross-sectional studies (by increasing the sample size), it is very diffi¬ 
cult to achieve in aggregate time series work. 

For all these reasons, the fact that the OLS estimators are BLUE despite multicollinear¬ 
ity is of little consolation in practice. We must see what happens or is likely to happen in 
any given sample, a topic discussed in the following section. 


10.5 Practical Consequences of Multicollinearity 

In cases of near or high multicollinearity, one is likely to encounter the following consequences: 

1. Although BLUE, the OLS estimators have large variances and covariances, making pre¬ 
cise estimation difficult. 

2. Because of consequence 1, the confidence intervals tend to be much wider, leading to 
the acceptance of the “zero null hypothesis” (i.e., the true population coefficient is zero) 
more readily. 

3. Also because of consequence 1, the t ratio of one or more coefficients tends to be 
statistically insignificant. 

4. Although the t ratio of one or more coefficients is statistically insignificant, R * 1 2 3 4 5 , the overall 
measure of goodness of fit, can be very high. 

5. The OLS estimators and their standard errors can he sensitive to small changes in the data. 

The preceding consequences can be demonstrated as follows. 
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Large Variances and Covariances of OLS Estimators 

To see large variances and covariances, recall that for the model (10.2.1) the variances and 
covariances of ft and 03 are given by 


var (ft) = 


(7.4.12) 

£40-4) 



var (ft) = 

O 2 

(7.4.15) 

£4(1-4) 



COW (02, 03) = 

-r 23 o- 2 

(7.4.17) 


(i-'2 2 3 ) v / £4£4 


where r 2 3 is the coefficient of correlation between .ft and X 3 . 

It is apparent from Eqs. (7.4.12) and (7.4.15) that as r 2 3 tends toward 1, that is, as 
collinearity increases, the variances of the two estimators increase and in the limit when 
r 23 = 1, they are infinite. It is equally clear from Eq. (7.4.17) that as rji increases toward 1, 
the covariance of the two estimators also increases in absolute value. [Note: cow ( 02 , 03) = 
cow ( 0 3 , 0 2 )-] 

The speed with which variances and covariances increase can be seen with the 
variance-inflating factor (VIF), which is defined as 


VIF = 


(10.5.1) 


VIF shows how the variance of an estimator is inflated by the presence of multicollinearity. 
As r \ 3 approaches 1, the VIF approaches infinity. That is, as the extent of collinearity 
increases, the variance of an estimator increases, and in the limit it can become infinite. As 
can be readily seen, if there is no collinearity between V 2 and X3, VIF will be 1. 

Using this definition, we can express Eqs. (7.4.12) and (7.4.15) as 


var ( 02 ) = ^-VIF 
l^ X 2i 

var (ft) = ^VIF 

l^ x 3i 


(10.5.2) 

(10.5.3) 


which show that the variances of 02 and 03 are directly proportional to the VIF. 

To give some idea about how fast the variances and covariances increase as r 23 
increases, consider Table 10.1, which gives these variances and covariances for selected 
values of r 23 . As this table shows, increases in r 23 have a dramatic effect on the estimated 
variances and covariances of the OLS estimators. When r 2 3 = 0.50, the var (ft) is 1.33 
times the variance when r 2 3 is zero, but by the time 03 reaches 0.95 it is about 10 times as 
high as when there is no collinearity. And lo and behold, an increase of r 23 from 0.95 to 
0.995 makes the estimated variance 100 times that when collinearity is zero. The same dra¬ 
matic effect is seen on the estimated covariance. All this can be seen in Figure 10.2. 

The results just discussed can be easily extended to the ^-variable model. In such a 
model, the variance of the Mi coefficient, as noted in Eq. (7.5.6), can be expressed as: 


var (ft) = 


a 2 

Eft 2 



(7.5.6) 
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TABLE 10.1 

The Effect of 
Increasing r 2 3 on 

Value of r 2 3 

VIF 

var (h) 

var (fS 2 )(r 2 3 * 0) 
var ($ 2 )(r 2 3 = 0) 

cov (p 2 , h) 

var (fa) and 

0) 

(2) 

(3)* 

(4) 

(5) 

covOMs) 

0.00 

1.00 

a 2 

HBH= a 


0 


0.50 

1.33 

£ X 2i 

1.33 x A 

1.33 

0.67 x B 


0.70 

1.96 

1.9 6x A 

1.96 

1.37 x B 


0.80 

2.78 

2.78 x A 

2.78 

2.22 x B 


0.90 

5.76 

5.26 x A 

5.26 

4.73 x B 


0.95 

10.26 

10.26 x A 

10.26 

9.74 x B 


0.97 

16.92 

16.92 x A 

16.92 

16.41 x 8 


0.99 

50.25 

50.25 x A 

50.25 

49.75 x 8 


0.995 

100.00 

100.00 x A 

100.00 

99.50 x 8 


0.999 

500.00 

500.00 x A 

500.00 

499.50 x 8 



*To find out the effect of increasing r 2 3 onvar(ft), note that A = when r 2 3 = 0, but the variance and 


FIGURE 10.2 

The behavior of 
var (fif) as a function 
of r 2 3- 



where jij — (estimated) partial regression coefficient of regressor A) 

R 2 . — R 2 in the regression of Xj on the remaining (k — 2) regressions (Note: There 
are [k — 1] regressors in the variable regression model.) 

£*? = £ {Xj-xjf 


We can also write Eq. (7.5.6) as 


var {fij) = ^VIF) (10.5.4) 

As you can see from this expression, var (fy) is proportional to a 2 and VIF but inversely 
proportional to £ xj. Thus, whether var(^ ; ) is large or small will depend on the three 
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TABLE 10.2 

The Effect of 
Increasing 
Collinearity on the 
95% Confidence 
Interval for 


Value of r 23 


0.50 


0.00 



1.96 se (ft) 


0.95 


0.995 



ingredients: (1) <x 2 , (2) VIF, and (3) xj. The last one, which ties in with Assumption 8 
of the classical model, states that the larger the variability in a regressor, the smaller the 
variance of the coefficient of that regressor, assuming the other two ingredients are con¬ 
stant, and therefore the greater the precision with which that coefficient can be estimated. 

Before proceeding further, it may he noted that the inverse of the VIF is called tolerance 
(TOL). That is, 



(10.5.5) 


When R 2 j — 1 (i.e., perfect collinearity), TOL, = 0 and when R 2 . = 0 (i.e., no collinearity 
whatsoever), TOL, is 1. Because of the intimate connection between VIF and TOL, one can 
use them interchangeably. 

Wider Confidence Intervals 

Because of the large standard errors, the confidence intervals for the relevant population 
parameters tend to be larger, as can be seen from Table 10.2. For example, when r 2 3 = 0.95, 
the confidence interval for is larger than when 03 = 0 by a factor of V 10.26, or about 3. 

Therefore, in cases of high multicollinearity, the sample data may be compatible with a 
diverse set of hypotheses. Hence, the probability of accepting a false hypothesis (i.e., type II 
error) increases. 

"Insignificant" f Ratios 

Recall that to test the null hypothesis that, say, /f 2 = 0, we use the t ratio, that is, /i 2 /se ($ 2 ), 
and compare the estimated t value with the critical t value from the t table. But as we have 
seen, in cases of high collinearity the estimated standard errors increase dramatically, 
thereby making the t values smaller. Therefore, in such cases, one will increasingly accept 
the null hypothesis that the relevant true population value is zero. 13 


13 ln terms of the confidence intervals, Pz = 0 value will lie increasingly in the acceptance region 
the degree of collinearity increases. 
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A High R 2 but Few Significant t Ratios 

Consider the &-variable linear regression model: 

Yt=, 8, + [hXi, + ftY 3i + ■ ■ • + p k X kl + u, 

In cases of high collinearity, it is possible to find, as we have just noted, that one or more of 
the partial slope coefficients are individually statistically insignificant on the basis of the t 
test. Yet the R 2 in such situations may be so high, say, in excess of 0.9, that on the basis 
of the F test one can convincingly reject the hypothesis that ft = ft = ■ ■ • = ft = 0. 
Indeed, this is one of the signals of multicollinearity—insignificant t values but a high 
overall R 2 (and a significant F value)! 

We shall demonstrate this signal in the next section, but this outcome should not be sur¬ 
prising in view of our discussion on individual versus joint testing in Chapter 8. As you 
may recall, the real problem here is the covariances between the estimators, which, as for¬ 
mula (7.4.17) indicates, are related to the correlations between the regressors. 

Sensitivity of OLS Estimators and Their Standard 
Errors to Small Changes in Data 

As long as multicollinearity is not perfect, estimation of the regression coefficients is pos¬ 
sible but the estimates and their standard errors become very sensitive to even the slightest 
change in the data. 

To see this, consider Table 10.3. Based on these data, we obtain the following multiple 
regression: 

%= 1.1939 + 0.4463X 2i + 0.0030X 3 , 

(0.7737) (0.1848) (0.0851) 

t = (1.5431) (2.4151) (0.0358) (10.5.6) 

R 2 = 0.8101 r 23 = 0.5523 

cov (ft, ft) = -0.00868 df = 2 

Regression (10.5.6) shows that none of the regression coefficients is individually signifi¬ 
cant at the conventional 1 or 5 percent levels of significance, although ft is significant at 
the 10 percent level on the basis of a one-tail t test. 

Now consider Table 10.4. The only difference between Tables 10.3 and 10.4 is that the 
third and fourth values ofX 3 are interchanged. Using the data of Table 10.4, we now obtain 

t= 1.2108 + 0.4014Y 2i + 0.0270X 3l 
(0.7480) (0.2721) (0.1252) 

t = (1.6187) (1.4752) (0.2158) (10.5.7) 

R 2 = 0.8143 r 23 = 0.8285 

cov (ft, ft) = -0.0282 df = 2 

As a result of a slight change in the data, we see that ft, which was statistically significant 
before at the 10 percent level of significance, is no longer significant even at that level. Also 
note that in Eq. (10.5.6) cov (ft, ft) = —0.00868 whereas in Eq. (10.5.7) it is —0.0282, a 
more than threefold increase. All these changes may be attributable to increased multi¬ 
collinearity: In Eq. (10.5.6) r 23 = 0.5523, whereas in Eq. (10.5.7) it is 0.8285. Similarly, the 
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TABLE 10.3 

Hypothetical Data 1 


TABLE 10.4 

Hypothetical Data 1 



Y,X 2 , and X 3 



Y,X 2 , andV, 


Y 

x 2 

*3 

Y 

X 2 

X3 

1 

2 

4 

1 

2 

4 

2 

0 

2 

2 

0 

2 

3 

4 

12 

3 

4 

0 

4 

6 

0 

4 

6 

12 

5 

8 

16 

5 

8 

16 


standard errors of /3 2 and /I3 increase between the two regressions, a usual symptom of 
collinearity. 

We noted earlier that in the presence of high collinearity one cannot estimate the indi¬ 
vidual regression coefficients precisely but that linear combinations of these coefficients 
may be estimated more precisely. This fact can be substantiated from the regressions 
(10.5.6) and (10.5.7). In the first regression the sum of the two partial slope coefficients is 
0.4493 and in the second it is 0.4284, practically the same. Not only that, their standard 
errors are practically the same, 0.1550 vs. 0.1823. 14 Note, however, the coefficient ofX 3 has 
changed dramatically, from 0.003 to 0.027. 


Consequences of Micronumerosity 

In a parody of the consequences of multicollinearity, and in a tongue-in-cheek manner, 
Goldberger cites exactly similar consequences of micronumerosity, that is, analysis based 
on small sample size. 15 The reader is advised to read Goldberger’s analysis to see why he 
regards micronumerosity as being as important as multicollinearity. 


10.6 An Illustrative Example 


EXAMPLE 10.1 

Consumption 
Expenditure 
in Relation to 
Income and 
Wealth 


To illustrate the various points made thus far, let us consider the consumption-income ex¬ 
ample from the introduction. Table 10.5 contains hypothetical data on consumption, 
income, and wealth. If we assume that consumption expenditure is linearly related to 
income and wealth, then, from Table 10.5 we obtain the following regression: 

HI = 24.7747 + 0.9415X2/- 0.0424X3/ 

(6.7525) (0.8229) (0.0807) 

t= (3.6690) (1.1442) (-0.5261) ( 10 . 6 . 1 ) 

R 2 = 0.9635 fl 2 = 0.9531 df=7 


14 These standard errors are obtained from the formula 

se 02 + h) = VvarOfe) + var(/§ 3 ) + 2 cov(/§ 2 , ft) 

Note that increasing collinearity increases the variances of $2 and fo, but these variances may be 
offset if there is high negative covariance between the two, as our results clearly point out. 
ls Goldberger, op. cit., pp. 248-250. 
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EXAMPLE 10.1 

(' Continued) 


TABLE 10.6 

ANOVA Table for 
the Consumption- 
Income-Wealth 
Example 


TABLE 1 0.5 Hypothetical Data on Consumption Expenditure F, Income X 2 , and Wealth A 3 


Y,S 

* 2 , $ 

X 3 , $ 

70 

80 

810 

65 

100 

1009 

90 

120 

1273 

95 

140 

1425 

110 

160 

1633 

115 

180 

1876 

120 

200 

2052 

140 

220 

2201 

155 

240 

2435 

150 

260 

2686 


Source of Variation SS df MSS 

Due to regression 8,565.5541 2 4,282.7770 

Due to residual 324.4459 7 46.3494 


Regression (10.6.1) shows that income and wealth together explain about 96 percent 
of the variation in consumption expenditure, and yet neither of the slope coefficients is 
individually statistically significant. Moreover, not only is the wealth variable statistically 
insignificant but also it has the wrong sign. A priori, one would expect a positive relation¬ 
ship between consumption and wealth. Although p 2 and (83 are individually statistically 
insignificant, if we test the hypothesis that f} 2 = fo = 0 simultaneously, this hypothesis can 
be rejected, as Table 10.6 shows. Under the usual assumption we obtain 


4282.7770 

46.3494 


= 92.4019 


( 10 . 6 . 2 ) 


This F value is obviously highly significant. 

It is interesting to look at this result geometrically. (See Figure 10.3.) Based on the re¬ 
gression (10.6.1), we have established the individual 95 percent confidence intervals for 
P2 and (63 following the usual procedure discussed in Chapter 8 . As these intervals show, 
individually each of them includes the value of zero. Therefore, individually we can accept 
the hypothesis that the two partial slopes are zero. But, when we establish the joint confi¬ 
dence interval to test the hypothesis that fS 2 = $3 = 0 , that hypothesis cannot be accepted 
since the joint confidence interval, actually an ellipse, does not include the origin . 16 
As already pointed out, when collinearity is high, tests on individual regressors are not re¬ 
liable; in such cases it is the overall F test that will show if Y is related to the various 
regressors. 

Our example shows dramatically what multicollinearity does. The fact that the ftest is 
significant but the t values of X 2 and X3 are individually insignificant means that the two 
variables are so highly correlated that it is impossible to isolate the individual impact of 

( Continued ) 


16 As noted in Section 5.3, the topic of joint confidence interval is rather involved. The interested 
reader may consult the reference cited there. 
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EXAMPLE 10.1 

( Continued) 


FIGURE 10.3 Individual confidence intervals for f} 2 and fa and joint confidence 
interval (ellipse) for f} 2 and fi 2 . 



either income or wealth on consumption. As a matter of fact, if we regress X3 on X 2 , we 
obtain 

X 3 , = 7.5454 + 10.1909X 2 , 

(29.4758) (0.1643) ( 10 . 6 . 3 ) 

t= (0.2560) (62.0405) R 2 = 0.9979 

which shows that there is almost perfect collinearity between X 3 and X 2 . 

Now let us see what happens if we regress Y on X 2 only: 

|f = 24.4545 + 0.5091 X 2 , 

(6.4138) (0.0357) ( 10 . 6 . 4 ) 

t= (3.8128) (14.2432) R 2 = 0.9621 

In Eq. (10.6.1) the income variable was statistically insignificant, whereas now it is highly 
significant. If instead of regressing Y on X 2 , we regress it on X 3 , we obtain 

% = 24.411 + 0.0498X 3 ; 

(6.874) (0.0037) ( 10 . 6 . 5 ) 

t= (3.551) (13.29) R 2 = 0.9567 

We see that wealth has now a significant impact on consumption expenditure, whereas in 
Eq. (10.6.1) it had no effect on consumption expenditure. 

Regressions (10.6.4) and (10.6.5) show very clearly that in situations of extreme multi- 
collinearity dropping the highly collinear variable will often make the other X variable 
statistically significant. This result would suggest that a way out of extreme collinearity is 
to drop the collinear variable, but we shall have more to say about it in Section 10.8. 
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EXAMPLE 10.2 

Consumption 
Function for 
United States, 

We now consider a concrete set of data on real consumption expenditure (C), real dis¬ 
posable personal income (Yd), real wealth (W), and real interest rate (1) for the United 
States for the period 1947-2000. The raw data are given in Table 10.7. 

TABLE 10.7 U.S. Consumption Expenditure for the Period 1947-2000 

1947-2000 

Year 

C 

Yd 

W 

1 

Source: See Table 7.12. 

1947 

976.4 

1035.2 

5166.815 

-10.35094 


1948 

998.1 

1090 

5280.757 

-4.719804 


1949 

1025.3 

1095.6 

5607.351 

1.044063 


1950 

1090.9 

1192.7 

5759.515 

0.407346 


1951 

1107.1 

1227 

6086.056 

-5.283152 


1952 

1142.4 

1266.8 

6243.864 

-0.277011 


1953 

1197.2 

1327.5 

6355.613 

0.561137 


1954 

1221.9 

1344 

6797.027 

-0.138476 


1955 

1310.4 

1433.8 

71 72.242 

0.261997 


1956 

1348.8 

1502.3 

7375.18 

-0.736124 


1957 

1381.8 

1539.5 

7315.286 

-0.260683 


1958 

1393 

1553.7 

7869.975 

-0.57463 


1959 

1470.7 

1623.8 

8188.054 

2.295943 


1960 

1510.8 

1664.8 

8351.757 

1.511181 


1961 

1541.2 

1720 

8971.872 

1.296432 


1962 

161 7.3 

1803.5 

9091.545 

1.395922 


1963 

1684 

1871.5 

9436.097 

2.057616 


1964 

1 784.8 

2006.9 

10003.4 

2.026599 


1965 

1897.6 

2131 

10562.81 

2.111669 


1966 

2006.1 

2244.6 

10522.04 

2.020251 


1967 

2066.2 

2340.5 

11312.07 

1.212616 


1968 

2184.2 

2448.2 

12145.41 

1.054986 


1969 

2264.8 

2524.3 

11672.25 

1.732154 


1970 

2317.5 

2630 

11650.04 

1.166228 


1971 

2405.2 

2745.3 

12312.92 

-0.712241 


1972 

2550.5 

2874.3 

1 3499.92 

-0.155737 


1973 

2675.9 

3072.3 

1 3080.96 

1.413839 


1974 

2653.7 

3051.9 

11868.79 

-1.042571 


1975 

2710.9 

3108.5 

12634.36 

-3.533585 


1976 

2868.9 

3243.5 

13456.78 

-0.656766 


1977 

2992.1 

3360.7 

13786.31 

-1.190427 


1978 

3124.7 

3527.5 

14450.5 

0.11 3048 


1979 

3203.2 

3628.6 

15340 

1.70421 


1980 

3193 

3658 

15964.95 

2.298496 


1981 

3236 

3741.1 

15964.99 

4.703847 


1982 

3275.5 

3791.7 

16312.51 

4.449027 


1983 

3454.3 

3906.9 

16944.85 

4.690972 


1984 

3640.6 

4207.6 

1 7526.75 

5.848332 


1985 

3820.9 

4347.8 

19068.35 

4.330504 


1986 

3981.2 

4486.6 

20530.04 

3.768031 


1987 

4113.4 

4582.5 

21235.69 

2.819469 


1988 

4279.5 

4784.1 

22331.99 

3.287061 


( Continued ) 
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EXAMPLE 10.2 TABLE 

( Continued) Y ear 

1989 

1990 

1991 

1992 

1993 

1994 

1995 

1996 

1997 

1998 

1999 

2000 


).7 Continued 


C Yd 

4393.7 4906.5 

4474.5 5014.2 

4466.6 5033 

4594.5 5189.3 

4748.9 5261.3 

4928.1 5397.2 

5075.6 5539.1 

5237.5 5677.7 

5423.9 5854.5 

5683.7 6168.6 

5968.4 6320 

6257.8 6539.2 


W 

23659.8 

23105.13 

24050.21 

24418.2 

25092.33 

25218.6 

27439.73 

29448.19 

32664.07 

35587.02 

39591.26 

38167.72 


4.317956 

3.595025 

1.802757 

1.007439 

0.62479 

2.206002 

3.333143 

3.083201 

3.12 

3.583909 

3.245271 

3.57597 


We use the following for analysis 

In Q = ft + ft In Yd t + ft In W,+ft/ ( + u, ( 10 . 6 . 6 ) 

where In stands for logarithm. 

In this model the coefficients ft and ft give income and wealth elasticities, respectively 
(why?) and ft gives semielasticity (why?). The results of regression (10.6.6) are given in 
the following table. 

Dependent Variable: LOG (C) 

Method: Least Squares 
Sample: ®7-200*j 
Included observations: 54 


Coefficient Stdy. Error t-Statistie Prob. 


§ 

LOG (YD) 

LOG (WEALTH) 

INTEREST 

0.4677’jtif 

0.804873 

0.201270 

0 . oB 8 9 

0.042778 -10.93343 
0.017498 45.99836 
0.017593 11.44060 

0.000762 -3.529265 

0.0000 

0.0000 

0.0000 

0.0009 

.R-sguared 

0.999560 

Mean dependent var. 

7.826093 

Adjusted R-squared 

0.999533 

S.D. dependent var. 

0.552368 

S.E. of regression 

0.011934 

Akaike info criterion 

-5.947703 

Sum squared resid. 

0.Off121 

Schwar* criterion 

-5.800371 

Log likelihood 

164.5880 

Hannan-Quiraft cariter. 

-5.890883 

F-statistic 

37832.5? 

Durbin-Watson stat. 

1.289219 

Prob( F- s#piAitic) 

0.000000 



Note: LOG stands :flgr natural log. 


The results show that all the estimated coefficients are highly statistically significant, for 
their p values are extremely small. The estimated coefficients are interpreted as follows. 
The income elasticity is 0.80, suggesting that, holding other variables constant, if 
income goes up by 1 percent, the mean consumption expenditure goes up by about 
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EXAMPLE 10.2 0.8 percent . The wealth coefficient is « 0.20, meaning that if wealth goes up by 1 percent, 
(, Continued) mean consumption goes up by only 0.2 percent, again holding other variables constant. 

The coefficient of the interest rate variable tells us that as the interest rate goes up by one 
percentage point, consumption expenditure goes down by 0.26 percent, ceteris paribus. 

All the regressors have signs that accord with prior expectations, that is, income and 
wealth both have a positive impact on consumption but interest rate has a negative 
impact. 

Do we have to worry about the problem of multicollinearity in the present case? Ap¬ 
parently not, because all the coefficients have the right signs, each coefficient is individu¬ 
ally statistically significant, and the F value is also statistically highly significant, suggesting 
that, collectively, all the variables have a significant impact on consumption expenditure. 
The R * 1 2 value is also quite high. 

Of course, there is usually some degree of collinearity among economic variables. As 
long as it is not exact, we can still estimate the parameters of the model. For now, all we 
can say is that, in the present example, collinearity, if any, does not seem to be very severe. 
But in Section 10.7 we provide some diagnostic tests to detect collinearity and reexamine 
the U.S. consumption function to determine whether it is plagued by the collinearity 
problem. 


10.7 Detection of Multicollinearity 

Having studied the nature and consequences of multicollinearity, the natural question is: 
How does one know that collinearity is present in any given situation, especially in models 
involving more than two explanatory variables? Here it is useful to hear in mind Kmenta’s 
warning: 

1. Multicollinearity is a question of degree and not of kind. The meaningful distinction is 
not between the presence and the absence of multicollinearity, but between its various degrees. 

2. Since multicollinearity refers to the condition of the explanatory variables that are as¬ 
sumed to be nonstochastic, it is a feature of the sample and not of the population. 

Therefore, we do not “test for multicollinearity” but can, if we wish, measure its degree in 
any particular sample. 17 

Since multicollinearity is essentially a sample phenomenon, arising out of the largely 
nonexperimental data collected in most social sciences, we do not have one unique method 
of detecting it or measuring its strength. What we have are some rules of thumb, some in¬ 
formal and some formal, but rules of thumb all the same. We now consider some of these 
rules. 

1. High R 2 but few significant t ratios. As noted, this is the “classic” symptom of mul¬ 
ticollinearity. If R 2 is high, say, in excess of 0.8, the F test in most cases will reject the 
hypothesis that the partial slope coefficients are simultaneously equal to zero, but the indi¬ 
vidual t tests will show that none or very few of the partial slope coefficients are statistically 
different from zero. This fact was clearly demonstrated by our consumption-income-wealth 
example. 

Although this diagnostic is sensible, its disadvantage is that “it is too strong in the sense 
that multicollinearity is considered as harmful only when all of the influences of the 
explanatory variables on Y cannot be disentangled.” 18 

17 Jan Kmenta, Elements of Econometrics, 2d ed., Macmillan, New York, 1986, p. 431. 

18 lbid., p. 439. 
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2. High pair-wise correlations among regressors. Another suggested rule of thumb is 
that if the pair-wise or zero-order correlation coefficient between two regressors is high, 
say, in excess of 0.8, then multicollinearity is a serious problem. The problem with this 
criterion is that, although high zero-order correlations may suggest collinearity, it is not 
necessary that they be high to have collinearity in any specific case. To put the matter some¬ 
what technically, high zero-order correlations are a sufficient but not a necessary condition 
for the existence of multicollinearity because it can exist even though the zero-order or 
simple correlations are comparatively low (say, less than 0.50). To see this relationship, 
suppose we have a four-variable model: 

Yi = Pi + fllXli + foXn + ffiXfy + Ui 

and suppose that 

Xm = /. 2 X 2 t + A. 3 X\ L 

where X 2 and A3 are constants, not both zero. Obviously, X 4 is an exact linear combination 
ofX 2 and A3, giving /? 4 23 = 1, the coefficient of determination in the regression of X 4 on 
X 2 andX 3 . 

Now recalling the formula (7.11.5) from Chapter 7, we can write 


r| 3 - 2^42^43^23 
1 - >23 


(10.7.1) 


But since 7? 4 23 = 1 because of perfect collinearity, we obtain 

. _ r 42 + r 43 ~ 2^42^4303 
l->2 2 3 


(10.7.2) 


It is not difficult to see that Eq. (10.7.2) is satisfied by r 42 = 0.5, r 43 = 0.5, and 
r 2 3 = —0.5, which are not very high values. 

Therefore, in models involving more than two explanatory variables, the simple or zero- 
order correlation will not provide an infallible guide to the presence of multicollinearity. Of 
course, if there are only two explanatory variables, the zero-order correlations will suffice. 

3. Examination of partial correlations. Because of the problem just mentioned in 
relying on zero-order correlations, Farrar and Glauber have suggested that one should look 
at the partial correlation coefficients. 19 Thus, in the regression of Y on X 2 , X 2 , and X 4 , a find¬ 
ing that 234 is very high but rf 2 34 , r 2 324 , and r 2 423 are comparatively low may suggest 
that the variables X 2 ,X 2 , and X 4 are highly intercorrelated and that at least one of these vari¬ 
ables is superfluous. 

Although a study of the partial correlations may be useful, there is no guarantee that 
they will provide an infallible guide to multicollinearity, for it may happen that both R 2 and 
all the partial correlations are sufficiently high. But more importantly, C. Robert Wichers 
has shown 20 that the Farrar-Glauber partial correlation test is ineffective in that a given 
partial correlation may be compatible with different multicollinearity patterns. The 


19 D. E. Farrar and R. R. Glauber, "Multicollinearity in Regression Analysis: The Problem Revisited," 
Review of Economics and Statistics, vol. 49, 1967, pp. 92-107. 

20 "The Detection of Multicollinearity: A Comment," Review of Economics and Statistics, vol. 57, 1975, 
pp. 365-366. 
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Farrar-Glauber test has also been severely criticized by T. Krishna Kumar 21 and John 
O’Hagan and Brendan McCabe. 22 

4. Auxiliary regressions. Since multicollinearity arises because one or more of the 
regressors are exact or approximately linear combinations of the other regressors, one way 
of finding out which X variable is related to other X variables is to regress each X t on the 
remaining X variables and compute the corresponding R 2 , which we designate as R 2 ; each 
one of these regressions is called an auxiliary regression, auxiliary to the main regression 
of Y on the X’s. Then, following the relationship between F and R 2 established in 
Eq. (8.4.11), the variable 


Ft = 


(1-**^ ...J/(h-£+T) 


(10.7.3) 


follows the F distribution with k — 2 and n — k + 1 df. In Eq. (10.7.3) n stands for the 
sample size, k stands for the number of explanatory variables including the intercept term, 
and is the coefficient of determination in the regression of variable X, on the 

remaining X variables. 23 

If the computed F exceeds the critical F, at the chosen level of significance, it is taken to 
mean that the particular X, is collinear with other X’s; if it does not exceed the critical F t , 
we say that it is not collinear with other Xs, in which case we may retain that variable in the 
model. If Ft is statistically significant, we will still have to decide whether the particular A; 
should be dropped from the model. This question will be taken up in Section 10.8. 

But this method is not without its drawbacks, for 

... if the multicollinearity involves only a few variables so that the auxiliary regressions do not 
suffer from extensive multicollinearity, the estimated coefficients may reveal the nature of the 
linear dependence among the regressors. Unfortunately, if there are several complex linear 
associations, this curve fitting exercise may not prove to be of much value as it will be difficult 
to identify the separate interrelationships. 24 

Instead of formally testing all auxiliary R 2 values, one may adopt Klein’ rule of thumb, 
which suggests that multicollinearity may be a troublesome problem only if the R 2 obtained 
from an auxiliary regression is greater than the overall R 2 , that is, that obtained from the 
regression of Y on all the regressors. 25 Of course, like all other rules of thumb, this one 
should be used judiciously. 

5. Eigenvalues and condition index. From EViews and Stata, we can find the eigen¬ 
values and the condition index, to diagnose multicollinearity. We will not discuss eigenvalues 
here, for that would take us into topics in matrix algebra that are beyond the scope of this 


21 "Multicollinearity in Regression Analysis," Review of Economics and Statistics, vo1.57,1975, pp. 366-368. 
22 "Tests for the Severity of Multicollinearity in Regression Analysis: A Comment," Review of Economics 
and Statistics, vol. 57, 1975, pp. 368-370. 

23 For example, R 2 2 can be obtained by regressing X2; as follows: X2 i = 01 + 03X3, + 04X4/ + 

-b akXki + Eii. 

24 George C. Judge, R. Carter Hill, William E. Griffiths, Helmut Lutkepohl, and Tsoung-Chao Lee, 
Introduction to the Theory and Practice of Econometrics, John Wiley & Sons, New York, 1982, p. 621. 
25 Lawrence R. Klein, An Introduction to Econometrics, Prentice-Hall, Englewood Cliffs, NJ, 1962, p. 101. 
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book. From these eigenvalues, however, we can derive what is known as the condition 
number k defined as 

Maximum eigenvalue 
Minimum eigenvalue 

and the condition index (Cl) defined as 


CI = 


Maximum eigenvalue 
Minimum eigenvalue 


= Vk 


Then we have this rule of thumb: If k is between 100 and 1000 there is moderate to strong 
multicollinearity and if it exceeds 1000 there is severe multicollinearity. Alternatively, if 
the Cl (= sfk) is between 10 and 30, there is moderate to strong multicollinearity and if it 
exceeds 30 there is severe multicollinearity. 

For the illustrative example in App. 7A.5, the smallest eigenvalue is 3.786 and 
the largest eigenvalue is 187.5269 giving k = 187.5269/3.786 or about 49.53. Therefore 
Cl = V49.53 = 7.0377. Both k and Cl suggest that we do not have a serious collinearity 
problem. Incidentally, note that a low eigenvalue (in relation to the maximum eigenvalue) 
is generally an indication of near-linear dependencies in the data. 

Some authors believe that the condition index is the best available multicollinearity diag¬ 
nostic. But this opinion is not shared widely. For us, then, the Cl is just a rule of thumb, a bit 
more sophisticated perhaps. But for further details, the reader may consult the references. 26 

6. Tolerance and variance inflation factor. We have already introduced TOL and 
VIF. As R 2 j, the coefficient of determination in the regression of regressor Xj on the 
remaining regressors in the model, increases toward unity, that is, as the collinearity of Xj 
with the other regressors increases, VIF also increases and in the limit it can be infinite. 

Some authors therefore use the VIF as an indicator of multicollinearity. The larger the 
value of VIF,, the more “troublesome” or collinear the variable Xj. As a rule of thumb, if 
the VIF of a variable exceeds 10, which will happen if R 2 exceeds 0.90, that variable is said 
he highly collinear. 27 

Of course, one could use TOL, as a measure of multicollinearity in view of its intimate 
connection with VIF,. The closer TOL, is to zero, the greater the degree of collinearity of 
that variable with the other regressors. On the other hand, the closer TOL, is to 1, the greater 
the evidence that Xj is not collinear with the other regressors. 

VIF (or tolerance) as a measure of collinearity is not free of criticism. As Eq. (10.5.4) 
shows, var (j}j) depends on three factors: rx 2 , Y, x j, and VIF,. A high VIF can be counter¬ 
balanced by a low a 2 or a high Y xj. To put it differently, a high VIF is neither necessary 
nor sufficient to get high variances and high standard errors. Therefore, high multicolli¬ 
nearity, as measured by a high VIF, may not necessarily cause high standard errors. In all 
this discussion, the terms high and low are used in a relative sense. 

7. Scatterplot. It is a good practice to use a scatterplot to see how the various variables 
in a regression model are related. Figure 10.4 presents the scatterplot for the U.S. 


26 See especially D. A. Belsley, E. Kuh, and R. E. Welsch, Regression Diagnostics: Identifying Influential 
Data and Sources of Collinearity, John Wiley & Sons, New York, 1980, Chapter 3. However, this book is 
not for the beginner. 

27 See David C. Kleinbaum, Lawrence L. Kupper, and Keith E. Muller, Applied Regression Analysis and 
Other Multivariate Methods, 2d ed., PWS-Kent, Boston, Mass., 1988, p. 210. 
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FIGURE 10.4 

Scatterplot for 
Example 10.2 data. 
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consumption example discussed in the previous section (Example 10.2). This is a four-by- 
four box diagram because we have four variables in the model, a dependent variable 
(C) and three explanatory variables: real disposable personal income (Yd), real wealth (W), 
and real interest rate (I). 

First consider the main diagonal, going from the upper left-hand corner to the lower 
right-hand corner. There are no scatterpoints in these boxes that lie on the main diagonal. If 
there were, they would have a correlation coefficient of 1, for the plots would be of a given 
variable against itself. The off-diagonal boxes show intercorrelations among the variables. 
Take, for instance, the wealth box (W). It shows that wealth and income are highly corre¬ 
lated (the correlation coefficient between the two is 0.97), but not perfectly so. If they were 
perfectly correlated (i.e., if they had a correlation coefficient of 1), we would not have been 
able to estimate the regression (10.6.6) because we would have an exact linear relationship 
between wealth and income. The scatterplot also shows that the interest rate is not highly 
correlated with the other three variables. 

Since the scatterplot function is now included in several statistical packages, this diag¬ 
nostic should be considered along with the ones discussed earlier. But keep in mind that 
simple correlations between pairs of variables may not be a definitive indicator of collinear- 
ity, as pointed out earlier. 

To conclude our discussion of detecting multicollinearity, we stress that the various 
methods we have discussed are essentially in the nature of “fishing expeditions,” for we 
cannot tell which of these methods will work in any particular application. Alas, not much 
can be done about it, for multicollinearity is specific to a given sample over which the 
researcher may not have much control, especially if the data are nonexperimental in 
nature—the usual fate of researchers in the social sciences. 

Again as a parody of multicollinearity, Goldberger cites numerous ways of detecting 
micronumerosity, such as developing critical values of the sample size, n*, such that micron- 
umerosity is a problem only if the actual sample size, n, is smaller than n . The point of 
Goldberger’s parody is to emphasize that small sample size and lack of variability in the 
explanatory variables may cause problems that are at least as serious as those due to 
multicollinearity. 
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10.8 Remedial Measures 


What can be done if multicollinearity is serious? We have two choices: (1) do nothing or 
(2) follow some rules of thumb. 

Do Nothing 

The “do nothing” school of thought is expressed by Blanchard as follows: 28 

When students run their first ordinary least squares (OLS) regression, the first problem that they 
usually encounter is that of multicollinearity. Many of them conclude that there is something 
wrong with OLS; some resort to new and often creative techniques to get around the problem. 
But, we tell them, this is wrong. Multicollinearity is God’s will, not a problem with OLS or 
statistical technique in general. 

What Blanchard is saying is that multicollinearity is essentially a data deficiency prob¬ 
lem (micronumerosity, again) and sometimes we have no choice over the data we have 
available for empirical analysis. 

Also, it is not that all the coefficients in a regression model are statistically insignificant. 
Moreover, even if we cannot estimate one or more regression coefficients with greater pre¬ 
cision, a linear combination of them (i.e., estimable function) can be estimated relatively 
efficiently. As we saw in Eq. (10.2.3), we can estimate a uniquely, even if we cannot esti¬ 
mate its two components given there individually. Sometimes this is the best we can do with 
a given set of data. 29 

Rule-of-Thumb Procedures 

One can try the following rules of thumb to address the problem of multicollinearity; their 
success will depend on the severity of the collinearity problem. 

1. A priori information. Suppose we consider the model 

y, = Pi + fhx 2l + fox 3l + u, 

where Y — consumption, X 2 = income, and X 3 = wealth. As noted before, income and 
wealth variables tend to be highly collinear. But suppose a priori we believe that 
Pi — 0.1 OP 2 ; that is, the rate of change of consumption with respect to wealth is one-tenth 
the corresponding rate with respect to income. We can then run the following regression: 

Tf = pi+ p 2 X 2i +0.10 p 2 X 3i + u t 
= Pi+ P 2 X, + u, 

where X, = X 2i + 0. lX 3i . Once we obtain f) 2 , we can estimate p 3 from the postulated 
relationship between p 2 and p 3 . 

How does one obtain a priori information? It could come from previous empirical work 
in which the collinearity problem happens to be less serious or from the relevant theory 


28 0. J. Blanchard, Comment, journal of Business and Economic Statistics, vol. 5, 1967, pp. 449-451. 
The quote is reproduced from Peter Kennedy, A Guide to Econometrics, 4th ed., MIT Press, Cambridge, 
Mass., 1998, p. 190. 

29 For an interesting discussion on this, see J. Conlisk, "When Collinearity Is Desirable," Western 
Economic journal, vol. 9, 1971, pp. 393^407. 
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underlying the field of study. For example, in the Cobb-Douglas-type production function 
(7.9.1), if one expects constant returns to scale to prevail, then (fi 2 + ft) = 1, in which 
case we could run the regression (8.6.14), regressing the output-labor ratio on the capital- 
labor ratio. If there is collinearity between labor and capital, as generally is the case in most 
sample data, such a transformation may reduce or eliminate the collinearity problem. But a 
warning is in order here regarding imposing such a priori restrictions, “. . . since in general 
we will want to test economic theory’s a priori predictions rather than simply impose them 
on data for which they may not be true.” 30 However, we know from Section 8.6 how to test 
for the validity of such restrictions explicitly. 

2. Combining cross-sectional and time series data. A variant of the extraneous or a 
priori information technique is the combination of cross-sectional and time series data, 
known as pooling the data. Suppose we want to study the demand for automobiles in the 
United States and assume we have time series data on the number of cars sold, average 
price of the car, and consumer income. Suppose also that 

In Y t = fit + p 2 hi P, + ft In /, + u t 

where Y = number of cars sold, P = average price, I = income, and t = time. Our objective 
is to estimate the price elasticity, ft, and income elasticity, ft. 

In time series data the price and income variables generally tend to be highly collinear. 
Therefore, if we run the preceding regression, we shall be faced with the usual multi¬ 
collinearity problem. A way out of this has been suggested by Tobin. 31 He says that if we 
have cross-sectional data (for example, data generated by consumer panels, or budget stud¬ 
ies conducted by various private and governmental agencies), we can obtain a fairly reliable 
estimate of the income elasticity ft because in such data, which are at a point in time, the 
prices do not vary much. Let the cross-sectionally estimated income elasticity be #3. Using 
this estimate, we may write the preceding time series regression as 

Y* = ft + ft hi Pt + Ut 

where Y* = In Y — ft In /, that is, Y* represents that value of Y after removing from it the 
effect of income. We can now obtain an estimate of the price elasticity ft from the preced¬ 
ing regression. 

Although it is an appealing technique, pooling the time series and cross-sectional data in 
the manner just suggested may create problems of interpretation, because we are assuming 
implicitly that the cross-sectionally estimated income elasticity is the same thing as that 
which would be obtained from a pure time series analysis. 32 Nonetheless, the technique has 
been used in many applications and is worthy of consideration in situations where the cross- 
sectional estimates do not vary substantially from one cross section to another. An example 
of this technique is provided in Exercise 10.26. 

3. Dropping a variable(s) and specification bias. When faced with severe multi¬ 
collinearity, one of the “simplest” things to do is to drop one of the collinear variables. 


30 Mark B. Stewart and Kenneth F. Wallis, Introductory Econometrics, 2d ed., John Wiley & Sons, A 
Halstead Press Book, New York, 1981, p. 154. 

31 J. Tobin, "A Statistical Demand Function for Food in the U.S.A.," journal of the Royal Statistical 
Society, Ser. A, 1950, pp. 113-141. 

32 For a thorough discussion and application of the pooling technique, see Edwin Kuh, Capital Stock 
Growth: A Micro-Econometric Approach, North-Holland Publishing Company, Amsterdam, 1963, 
Chapters 5 and 6. 
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Thus, in our consumption-income-wealth illustration, when we drop the wealth variable, 
we obtain regression (10.6.4), which shows that, whereas in the original model the income 
variable was statistically insignificant, it is now “highly” significant. 

But in dropping a variable from the model we may be committing a specification bias 
or specification error. Specification bias arises from incorrect specification of the model 
used in the analysis. Thus, if economic theory says that income and wealth should both be 
included in the model explaining the consumption expenditure, dropping the wealth vari¬ 
able would constitute specification bias. 

Although we will discuss the topic of specification bias in Chapter 13, we caught a 
glimpse of it in Section 7.7. If, for example, the true model is 

Y i =p 1 +p 2 x 2i +p 3 x 3i + u i 

but we mistakenly fit the model 

It-h+bnXn+Ui (10.8.1) 

then it can be shown that (see Appendix 13A. 1) 

E(b 12 ) = p2 + P3b32 ( 10 . 8 . 2 ) 

where b 32 = slope coefficient in the regression of X 3 on X 2 . Therefore, it is obvious from 
Eq. (10.8.2) that b 12 will be a biased estimate of p 2 as long as 632 is different from zero (it 
is assumed that p 3 is different from zero; otherwise there is no sense in including X 3 in the 
original model). 33 Of course, if £32 is zero, we have no multicollinearity problem to begin 
with. It is also clear from Eq. (10.8.2) that if both b 32 and p 3 are positive (or both are neg¬ 
ative), E(b \ 2 ) will be greater than p 2 ; hence, on the average b\ 2 will overestimate p 2 , lead¬ 
ing to a positive bias. Similarly, if the product b 32 fc is negative, on the average b\ 2 will 
underestimate P 2 , leading to a negative bias. 

From the preceding discussion it is clear that dropping a variable from the model to 
alleviate the problem of multicollinearity may lead to the specification bias. Hence the rem¬ 
edy may be worse than the disease in some situations because, whereas multicollinearity 
may prevent precise estimation of the parameters of the model, omitting a variable may 
seriously mislead us as to the true values of the parameters. Recall that OLS estimators are 
BLUE despite near collinearity. 

4. Transformation of variables. Suppose we have time series data on consumption 
expenditure, income, and wealth. One reason for high multicollinearity between income 
and wealth in such data is that over time both the variables tend to move in the same direc¬ 
tion. One way of minimizing this dependence is to proceed as follows. 

If the relation 


Y, = Pi + p 2 X 2t + p 3 X 3t + u t (10.8.3) 

holds at time t, it must also hold at time t — 1 because the origin of time is arbitrary any¬ 
way. Therefore, we have 

Y,-i = Pi + p 2 X 2 , t , + p 3 X Xt -i + u,-i (10.8.4) 

If we subtract Eq. (10.8.4) from Eq. (10.8.3), we obtain 

Y, - Y t _\ = p 2 (X 2t - X 2 , t -x) + p 3 (X 3t - X xt _ 3 ) + v, (10.8.5) 


33 Note further that if £>32 does not approach zero as the sample size is increased indefinitely, then bi 2 
will be not only biased but also inconsistent. 
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where v, = u t — u t ~ 1. Equation (10.8.5) is known as the first difference form because we 
run the regression not on the original variables but on the differences of successive values 
of the variables. 

The first difference regression model often reduces the severity of multicollinearity 
because, although the levels of X 2 and X 3 may be highly correlated, there is no a priori rea¬ 
son to believe that their differences will also be highly correlated. 

As we shall see in the chapters on time series econometrics, an incidental advantage of 
the first difference transformation is that it may make a nonstationary time series station¬ 
ary. In those chapters we will see the importance of stationary time series. As noted in 
Chapter 1, loosely speaking, a time series, say, Y t , is stationary if its mean and variance do 
not change systematically over time. 

Another commonly used transformation in practice is the ratio transformation. Con¬ 
sider the model: 


( 10 . 8 . 6 ) 


Y t = p l +p 2 X 2t +foX 3t + u t 


where Y is consumption expenditure in real dollars, X 2 is GDP, and A3 is total population. 
Since GDP and population grow over time, they are likely to be correlated. One “solution” 
to this problem is to express the model on a per capita basis, that is, by dividing Eq. (10.8.4) 
by X 3 , to obtain: 



(10.8.7) 


Such a transformation may reduce collinearity in the original variables. 

But the first difference or ratio transformations are not without problems. For instance, 
the error term v t in Eq. (10.8.5) may not satisfy one of the assumptions of the classical lin¬ 
ear regression model, namely, that the disturbances are serially uncorrelated. As we will see 
in Chapter 12, if the original disturbance term u t is serially uncorrelated, the error term v t 
obtained previously will in most cases be serially correlated. Therefore, the remedy may be 
worse than the disease. Moreover, there is a loss of one observation due to the differencing 
procedure, and therefore the degrees of freedom are reduced by one. In a small sample, this 
could be a factor one would wish at least to take into consideration. Furthermore, the first- 
differencing procedure may not be appropriate in cross-sectional data where there is no log¬ 
ical ordering of the observations. 

Similarly, in the ratio model (10.8.7), the error term 



will be heteroscedastic, if the original error term u t is homoscedastic, as we shall see in 
Chapter 11. Again, the remedy may be worse than the disease of collinearity. 

In short, one should be careful in using the first difference or ratio method of trans¬ 
forming the data to resolve the problem of multicollinearity. 

5. Additional or new data. Since multicollinearity is a sample feature, it is possible 
that in another sample involving the same variables collinearity may not be so serious as in 
the first sample. Sometimes simply increasing the size of the sample (if possible) may 
attenuate the collinearity problem. For example, in the three-variable model we saw that 
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Now as the sample size increases, J2 x\ will generally increase. (Why?) Therefore, for any 
given r 2 3, the variance of p 2 will decrease, thus decreasing the standard error, which will 
enable us to estimate fi 2 more precisely. 

As an illustration, consider the following regression of consumption expenditure Y on 
income X 2 and wealth^ based on 10 observations: 34 

Y t = 24.377 + 0.8716*2/- 0.0349W,; (10 8 8) 

t= (3.875) (2.7726) (-1.1595) R 2 = 0.9682 


The wealth coefficient in this regression not only has the wrong sign but is also statistically 
insignificant at the 5 percent level. But when the sample size was increased to 40 observa¬ 
tions (micronumerosity?), the following results were obtained: 


% = 2.0907 + 0.7299*2; + 0.0605* 3 ; 
t = (0.8713) (6.0014) (2.0014) R 2 = 0.9672 


Now the wealth coefficient not only has the correct sign but also is statistically significant 
at the 5 percent level. 

Obtaining additional or “better” data is not always that easy, for as Judge et al. note: 
Unfortunately, economists seldom can obtain additional data without bearing large costs, 
much less choose the values of the explanatory variables they desire. In addition, when adding 
new variables in situations that are not controlled, we must be aware of adding observations 
that were generated by a process other than that associated with the original data set; that is, 
we must be sure that the economic structure associated with the new observations is the same 
as the original structure. 35 

6. Reducing collinearity in polynomial regressions. In Section 7.10 we discussed 
polynomial regression models. A special feature of these models is that the explanatory 
variahle(s) appears with various powers. Thus, in the total cubic cost function involving the 
regression of total cost on output, (output) 2 , and (output) 3 , as in Eq. (7.10.4), the various 
output terms are going to be correlated, making it difficult to estimate the various slope co¬ 
efficients precisely. 36 In practice though, it has been found that if the explanatory vari¬ 
able^) is expressed in the deviation form (i.e., deviation from the mean value), 
multicollinearity is substantially reduced. But even then the problem may persist, 37 in 
which case one may want to consider techniques such as orthogonal polynomials. 38 

7. Other methods of remedying multicollinearity. Multivariate statistical techniques 
such as factor analysis and principal components or techniques such as ridge regression 
are often employed to “solve” the problem of multicollinearity. Unfortunately, these tech¬ 
niques are beyond the scope of this book, for they cannot be discussed competently with¬ 
out resorting to matrix algebra. 39 


34 l am indebted to the late Albert Zucker for providing the results given in the following regressions. 
35 judge et al., op. cit., p. 625. See also Section 10.9. 

36 As noted, since the relationship between X, X 2 , and X 3 is nonlinear, polynomial regressions do not 
violate the assumption of no multicollinearity of the classical model, strictly speaking. 

37 See R. A. Bradley and S. S. Srivastava, "Correlation and Polynomial Regression," American Statisti¬ 
cian, vol. 33, 1979, pp. 11-14. 

38 See Norman Draper and Harry Smith, Applied Regression Analysis, 2d ed., John Wiley Sc Sons, New 
York, 1981, pp. 266-274. 

39 A readable account of these techniques from an applied viewpoint can be found in Samprit Chatter- 
jee and Bertram Price, Regression Analysis by Example, John Wiley Sr Sons, New York, 1977, Chapters 7 
and 8. See also H. D. Vinod, "A Survey of Ridge Regression and Related Techniques for Improvements 
over Ordinary Least Squares," Review of Economics and Statistics, vol. 60, February 1978, pp. 121-131. 
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10.9 Is Multicollinearity Necessarily Bad? Maybe Not, 

If the Objective Is Prediction Only 

It has been said that if the sole purpose of regression analysis is prediction or forecasting, 
then multicollinearity is not a serious problem because the higher the R 2 , the better the pre¬ 
diction. 40 But this may be so .. as long as the values of the explanatory variables for 
which predictions are desired obey the same near-exact linear dependencies as the original 
design [data] matrix X” 41 Thus, if in an estimated regression it was found that X 2 — 2X 3 
approximately, then in a future sample used to forecast Y, X 2 should also be approximately 
equal to 2X 3 , a condition difficult to meet in practice (see footnote 35), in which case 
prediction will become increasingly uncertain 42 Moreover, if the objective of the analysis 
is not only prediction but also reliable estimation of the parameters, serious multicollinear¬ 
ity will be a problem because we have seen that it leads to large standard errors of the 
estimators. 

In one situation, however, multicollinearity may not pose a serious problem. This is the 
case when R 2 is high and the regression coefficients are individually significant as revealed 
by the higher t values. Yet, multicollinearity diagnostics, say, the condition index, indicate 
that there is serious collinearity in the data. When can such a situation arise? As Johnston 
notes: 

This can arise if individual coefficients happen to be numerically well in excess of the true 
value, so that the effect still shows up in spite of the inflated standard error and/or because the 
true value itself is so large that even an estimate on the downside still shows up as significant. 43 


10.10 An Extended Example: The Longley Data 

We conclude this chapter by analyzing the data collected by Longley. 44 Although originally 
collected to assess the computational accuracy of least-squares estimates in several com¬ 
puter programs, the Longley data have become the workhorse to illustrate several econo¬ 
metric problems, including multicollinearity. The data are reproduced in Table 10.8. The 
data are time series for the years 1947-1962 and pertain to Y = number of people 
employed, in thousands; X\ — GNP implicit price deflator; X 2 = GNP, millions of dollars; 
X 3 — number of people unemployed in thousands, X4 = number of people in the armed 
forces, X5 = noninstitutionalized population over 14 years of age; and X f , = year, equal to 
1 in 1947, 2 in 1948, and 16 in 1962. 


40 See R. C. Geary, "Some Results about Relations between Stochastic Variables: A Discussion Docu¬ 
ment," Review of International Statistical Institute, vol. 31, 1963, pp. 163-181. 

41 Judge et al., op. cit., p. 619. You will also find on this page proof of why, despite collinearity, one 
can obtain better mean predictions if the existing collinearity structure also continues in the future 

42 Foran excellent discussion, see E. Malinvaud, Statistical Methods of Econometrics, 2d ed., North- 
Holland Publishing Company, Amsterdam, 1970, pp. 220-221. 

43 J. Johnston, Econometric Methods, 3d ed., McGraw-Hill, New York, 1984, p. 249. 

^J. Longley, "An Appraisal of Least-Squares Programs from the Point of the User," Journal of the 
American Statistical Association, vol. 62, 1967, pp. 819-841. 
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TABLE 10.8 
Longley Data 

Source: J. Longley, “An 
Appraisal of Least-Squares 
Programs from the Point of the 
User,” Journal of the American 

1967, pp. 819-841. 


Observation Y 

1947 60,323 

1948 61,122 

1949 60,171 

1950 61,187 

1951 63,221 

1952 63,639 

1953 64,989 

1954 63,761 

1955 66,019 

1956 67,857 

1957 68,169 

1958 66,513 

1959 68,655 

1960 69,564 

1961 69,331 

1962 70,551 


Xi 

X 2 


830 

234,289 

2,356 

885 

259,426 

2,325 

882 

258,054 

3,682 

895 

284,599 

3,351 

962 

328,975 

2,099 

981 

346,999 

1,932 

990 

365,385 

1,870 

1,000 

363,112 

3,578 

1,012 

397,469 

2,904 

1,046 

419,180 

2,822 

1,084 

442,769 

2,936 

1,108 

444,546 

4,681 

1,126 

482,704 

3,813 

1,142 

502,601 

3,931 

1,157 

518,173 

4,806 

1,169 

554,894 

4,007 


X 4 

Xs 

Time 

1,590 

107,608 

1 

1,456 

108,632 

2 

1,616 

109,773 

3 

1,650 

110,929 

4 

3,099 

112,075 

5 

3,594 

113,270 

6 

3,547 

115,094 

7 

3,350 

116,219 

8 

3,048 

117,388 

9 

2,857 

118,734 

10 

2,798 

120,445 

11 

2,637 

121,950 

12 

2,552 

123,366 

13 

2,514 

125,368 

14 

2,572 

127,852 

15 

2,827 

1 30,081 

16 


Assume that our objective is to predict Y on the basis of the six X variables. Using 
EViews6, we obtain the following regression results: 

Dependent Variable: V 
Sample: 1947-1962 


Variable Coefficient Std. Error fc-Statistic Prob. 


C 

*1 

*2 

*3 

Xi 

x 5 

X 6 


-3482259. 890420.4 
15.06187 84.91493 
■0.03S819 0.033491 
■2.020230 0.488400 
■1.033227 0.214274 
•0.051104 0.226073 
1829.151 455.4785 


-3.910803 0,0036 

0.177376 0.8631 

-1.069516 0.3127 

-4.13642? 0.0025 

-4.821985 0.0009 

-0.226051 0.8262 

4.015890 0.0030 


R-squared 

0.995479 

Mean dependent var. 

65317.00 

Adjusted R-squared 

0.992465 

S.D. dependent var. 

3511.968 

S.E. of regression 

304.8541 

Akaike Info criterion 

14.57718 

Sum squared resid. 

836424.1 

Schwarz criterion 

14.91519 

Log likelihood 

-109.6174 

E-statist its 

330.2853 

Durbin-Watson stat. 

2.559488 

Prob(E-statistic) 

0.000000 


A glance at these results would suggest that we have the collinearity problem, for the R 2 
value is very high, but quite a few variables are statistically insignificant (X \, X2, and X$), a 
classic symptom of multicollinearity. To shed more light on this, we show in Table 10.9 the 
intercorrelations among the six regressors. 

This table gives what is called the correlation matrix. In this table the entries on the 
main diagonal (those running from the upper left-hand corner to the lower right-hand 
comer) give the correlation of one variable with itself, which is always 1 by definition, and 
the entries off the main diagonal are the pair-wise correlations among the X variables. If 
you take the first row of this table, this gives the correlation of X\ with the other X variables. 
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TABLE 10.9 
Intercorrelations 



Xi 

x 2 

X3 

x 4 

Xs 

X 6 

Xi 

1.000000 

0.991589 

0.620633 

0.464744 

0.979163 

0.991149 

x 2 

0.991589 

1.000000 

0.604261 

0.446437 

0.991090 

0.995273 

x 3 

0.620633 

0.604261 

1.000000 

-0.177421 

0.686552 

0.668257 

x 4 

0.464744 

0.446437 

-0.1 77421 

1.000000 

0.364416 

0.41 7245 

X5 

0.979163 

0.991090 

0.686552 

0.364416 

1.000000 

0.993953 

x 6 

0.991149 

0.995273 

0.668257 

0.417245 

0.993953 

1.000000 


For example, 0.991589 is the correlation between X\ and Xi, 0.620633 is the correlation 
between X\ and X 3 , and so on. 

As you can see, several of these pair-wise correlations are quite high, suggesting that 
there may be a severe collinearity problem. Of course, remember the warning given earlier 
that such pair-wise correlations may be a sufficient but not a necessary condition for the 
existence of multicollinearity. 

To shed further light on the nature of the multicollinearity problem, let us run the auxil¬ 
iary regressions, that is the regression of each X variable on the remaining X variables. To 
save space, we will present only the R 2 values obtained from these regressions, which are 
given in Table 10.10. Since the R 2 values in the auxiliary regressions are very high (with the 
possible exception of the regression 0O4) on the remaining A variables, it seems that we do 
have a serious collinearity problem. The same information is obtained from the tolerance 
factors. As noted previously, the closer the tolerance factor is to zero, the greater is the 
evidence of collinearity. 

Applying Klein’s rule of thumb, we see that the R 2 values obtained from the auxiliary 
regressions exceed the overall R 2 value (that is, the one obtained from the regression of Y 
on all the X variables) of 0.9954 in 3 out of 6 auxiliary regressions, again suggesting that 
indeed the Longley data are plagued by the multicollinearity problem. Incidentally, apply¬ 
ing the F test given in Eq. (10.7.3) the reader should verify that the R 2 values given in the 
preceding tables are all statistically significantly different from zero. 

We noted earlier that the OLS estimators and their standard errors are sensitive to small 
changes in the data. In Exercise 10.32 the reader is asked to rerun the regression of Y on all 
the six X variables but drop the last data observations, that is, run the regression for the 
period 1947-1961. You will see how the regression results change by dropping just a single 
year’s observations. 

Now that we have established that we have the multicollinearity problem, what “reme¬ 
dial” actions can we take? Let us reconsider our original model. First of all, we could 
express GNP not in nominal terms, but in real terms, which we can do by dividing nominal 
GNP by the implicit price deflator. Second, since noninstitutional population over 14 years 
of age grows over time because of natural population growth, it will be highly correlated 
with time, the variable X ( , in our model. Therefore, instead of keeping both these variables, 
we will keep the variable X 5 and drop X 6 . Third, there is no compelling reason to include X 3 , 


TABLE 10.10 

R 2 Values from the 
Auxiliary Regressions 


Dependent Variable 

R 2 Value 

Tolerance (TOL) = 1 

- R 2 

Xi 

0.9926 

0.0074 


x 2 

0.9994 

0.0006 


Xb 

0.9702 

0.0298 


X 4 

0.7213 

0.2787 


X 5 

0.9970 

0.0030 


x 6 

0.9986 

0.0014 
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Summary and 
Conclusions 


the number of people unemployed; perhaps the unemployment rate would have been a 
better measure of labor market conditions. But we have no data on the latter. So, we will 
drop the variable X 3 . Making these changes, we obtain the following regression results 
(RGNP = real GNP): 45 


Dependent Variable: Y 
Sample: 1947-1962 


Variable Coefficient Std. Error t-Stafcistic Prob. 


c 

RGNP 


*4 

X 5 


65720.37 

9.736496 

•0.687966 

-0.299537 


10624.81 

1.791552 

0.322238 

0.141761 


6.185558 0.0000 
5.434671 0.0002 
•2.134965 0.0541 
-2.112965 0.0562 


R-squared 

0.981404 

Mean dependent var. 

65317.00 

Adjusted J?-squared 

0.976755 

S.D. dependent var. 

3511.968 

S.E. of regression 

535.4492 

Akaike info criterion 

15.61641 

Sum squared resid. 

3440470. 

Schwarz criterion 

15.80955 

Log likelihood 

-120.9313 

E-statietic 

211.0972 

Durbin-Watson stat. 

1.654069 

Prob(E-statistic) 

0.000000 


Although the R 1 2 3 value has declined slightly compared with the original R 2 , it is still very 
high. Now all the estimated coefficients are significant and the signs of the coefficients 
make economic sense. 

We leave it for the reader to devise alternative models and see how the results change. 
Also keep in mind the warning sounded earlier about using the ratio method of transforming 
the data to alleviate the problem of collinearity. We will revisit this question in Chapter 11. 


1. One of the assumptions of the classical linear regression model is that there is no multi- 
collinearity among the explanatory variables, the X’s. Broadly interpreted, multi- 
collinearity refers to the situation where there is either an exact or approximately exact 
linear relationship among the X variables. 

2. The consequences of multicollinearity are as follows: If there is perfect collinearity 
among the X’s, their regression coefficients are indeterminate and their standard errors 
are not defined. If collinearity is high but not perfect, estimation of regression coeffi¬ 
cients is possible but their standard errors tend to be large. As a result, the population 
values of the coefficients cannot be estimated precisely. However, if the objective is to 
estimate linear combinations of these coefficients, the estimable functions, this can be 
done even in the presence of perfect multicollinearity. 

3. Although there are no sure methods of detecting collinearity, there are several indicators 
of it, which are as follows: 

(a) The clearest sign of multicollinearity is when R 2 is very high but none of the regres¬ 
sion coefficients is statistically significant on the basis of the conventional t test. This 
case is, of course, extreme. 


45 The coefficient of correlation between X5 and X& is about 0.9939, a very high correlation indeed. 
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(b) In models involving just two explanatory variables, a fairly good idea of collinear- 
ity can be obtained by examining the zero-order, or simple, correlation coefficient 
between the two variables. If this correlation is high, multicollinearity is generally 
the culprit. 

(c) However, the zero-order correlation coefficients can be misleading in models in¬ 
volving more than two X variables since it is possible to have low zero-order corre¬ 
lations and yet find high multicollinearity. In situations like these, one may need to 
examine the partial correlation coefficients. 

{d) If R 2 is high but the partial correlations are low, multicollinearity is a possibility. 
Here one or more variables may be superfluous. But if R 2 is high and the partial cor¬ 
relations are also high, multicollinearity may not be readily detectable. Also, as 
pointed out by C. Robert Wichers, Krishna Kumar, John O’Hagan, and Brendan 
McCabe, there are some statistical problems with the partial correlation test sug¬ 
gested by Farrar and Glauber. 

(e) Therefore, one may regress each of the X t variables on the remaining X variables in 
the model and find out the corresponding coefficients of determination R 2 . A high 
R 2 would suggest that Tj is highly correlated with the rest of the A’s. Thus, one may 
drop that Xj from the model, provided it does not lead to serious specification bias. 

4. Detection of multicollinearity is half the battle. The other half is concerned with how to 
get rid of the problem. Again there are no sure methods, only a few rules of thumb. Some 
of these rules are as follows: (1) using extraneous or prior information, (2) combining 
cross-sectional and time series data, (3) omitting a highly collinear variable, (4) trans¬ 
forming data, and (5) obtaining additional or new data. Of course, which of these rules 
will work in practice will depend on the nature of the data and severity of the collinear- 
ity problem. 

5. We noted the role of multicollinearity in prediction and pointed out that unless the 
collinearity structure continues in the future sample it is hazardous to use the estimated 
regression that has been plagued by multicollinearity for the purpose of forecasting. 

6. Although multicollinearity has received extensive (some would say excessive) attention in 
the literature, an equally important problem encountered in empirical research is that of 
micronumerosity, smallness of sample size. According to Goldberger, “When a research 
article complains about multicollinearity, readers ought to see whether the complaints 
would he convincing if “micronumerosity” were substituted for “multicollinearity.” 46 He 
suggests that the reader ought to decide how small n, the number of observations, is before 
deciding that one has a small-sample problem, just as one decides how high an R 2 value is 
in an auxiliary regression before declaring that the collinearity problem is very severe. 


Questions 

10.1. In the &-variable linear regression model there are k normal equations to estimate the 
k unknowns. These normal equations are given in Appendix C. Assume that X^ is a 
perfect linear combination of the remaining X variables. How would you show that 
in this case it is impossible to estimate the k regression coefficients? 


s Goldberger, op. cit., p. 250. 
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TABLE 10.11 


Y 

*2 

*3 

-10 

1 

1 

-8 

2 

3 

-6 

3 

5 

-4 

4 

7 

-2 

5 

9 

0 

6 

11 

2 

7 

13 

4 

8 

15 

6 

9 

17 

8 

10 

19 

10 

11 

21 


10.2. Consider the set of hypothetical data in Table 10.11. Suppose you want to fit the 
model 


% =fi\+ p 2 X 2i + p 3 X 3l + u, 


to the data. 

a. Can you estimate the three unknowns? Why or why not? 

b. If not, what linear functions of these parameters, the estimable functions, can you 
estimate? Show the necessary calculations. 

10.3. Refer to the child mortality example discussed in Chapter 8 (Example 8.1). The 
example there involved the regression of the child mortality (CM) rate on per capita 
GNP (PGNP) and female literacy rate (FLR). Now suppose we add the variable, total 
fertility rate (TFR). This gives the following regression results. 

Dependent Variable: CM 


Variable Coefficient Std. Error t-Statistic Prob. 


-- 

PGNP 

FLR 

TFR 


168.3067 

-0.005511 

-1.768029 

12.86864 


32.89165 

0.001878 

0.248017 

4.190533 


5.117003 0.0000 
-2.934275 0.0047 
-7.128663 0.0000 
3*070883 0.0032 


R-squared 

C.747372 

Mean dependent var. 

141.5000 

Adjusted R-squared 

0.734740 

S.D. dependent var. 

75.97807 

S.l. of regression 

39.13157 

Akaike info criterior 

l 10.23218 

Sum squared resid. 

91875.38 

Schwarz criterion 

10.36711 

Log likelihood 

-323.4298 

F-statistic 

89.16767 

Durbin-Watson stat. 

2.170318 

Prob( F- statistic) 

0.000000 


a. Compare these regression results with those given in Eq. (8.1.4). What changes 
do you see? How do you account for them? 

b. Is it worth adding the variable TFR to the model? Why? 

c. Since all the individual t coefficients are statistically significant, can we say that 
we do not have a collinearity problem in the present case? 

10.4. If the relation X\X \+ X 2 X 2 ; + X 3 X 3i = 0 holds true for all values of X\, X 2 , and 
X 3 , estimate n 2 . 3 , r \ 3 2 , and r 2 3 . 1 . Also find R 2 23 , ^ 2 . 13 , and 12 - What is the 
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degree of multicollinearity in this situation? Note: R\ 23 is the coefficient of deter¬ 
mination in the regression of Y on X 2 and X 3 . Other R 2 values are to be interpreted 
similarly. 

10.5. Consider the following model: 

Y t = Pi + P 2 X t + + /j 4 X ( _ 2 + p 5 X t _ 3 + p 6 X t _4 + u t 

where Y — consumption, X — income, and t — time. The preceding model postu¬ 
lates that consumption expenditure at time t is a function not only of income at time 
t but also of income through previous periods. Thus, consumption expenditure in 
the first quarter of2000 is a function of income in that quarter and the four quarters 
of 1999. Such models are called distributed lag models, and we shall discuss them 
in a later chapter. 

a. Would you expect multicollinearity in such models and why? 

b. If collinearity is expected, how would you resolve the problem? 

10.6. Consider the illustrative example of Section 10.6 (Example 10.1). How would you 
reconcile the difference in the marginal propensity to consume obtained from 
Eqs. (10.6.1) and (10.6.4)? 

10.7. In data involving economic time series such as GNP, money supply, prices, income, 
unemployment, etc., multicollinearity is usually suspected. Why? 

10.8. Suppose in the model 

Yi= P\+ fhX 2i + hX 3i + Ui 

that r2 3, the coefficient of correlation between X 2 and X 3 , is zero. Therefore, some¬ 
one suggests that you run the following regressions: 

Yt = a, +a 2 X 2 i+u u 
Yi = y 1 + y 3 X 3i + Uli 

a. Will a 2 = and y 3 = ft? Why? 

b. Will P\ equal ot\ or y x or some combination thereof? 

c. Will var^) = var (012) and var^) = var(/3)? 

10.9. Refer to the illustrative example of Chapter 7 where we fitted the Cobb- 
Douglas production function to the manufacturing sector of all 50 states and the 
District of Columbia for 2005. The results of the regression given in Eq. (7.9.4) 
show that both the labor and capital coefficients are individually statistically 
significant. 

a. Find out whether the variables labor and capital are highly correlated. 

b. If your answer to (a) is affirmative, would you drop, say, the labor variable from 
the model and regress the output variable on capital input only? 

c. If you do so, what kind of specification bias is committed? Find out the nature 
of this bias. 

10.10. Refer to Example 7.4. For this problem the correlation matrix is as follows: 

Xi X 2 

Xi 1 0.9742 

Xf 1.0 

X , 3 


0.9284 

0.9872 

1.0 
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a. “Since the zero-order correlations are very high, there must be serious multi- 
collinearity.” Comment. 

b. Would you drop variables Xf and X] from the model? 

c. If you drop them, what will happen to the value of the coefficient of Xp 

10.11. Stepwise regression. In deciding on the “best” set of explanatory variables for a 
regression model, researchers often follow the method of stepwise regression. In 
this method one proceeds either by introducing the X variables one at a time (step¬ 
wise forward regression) or by including all the possible X variables in one multi¬ 
ple regression and rejecting them one at a time (stepwise backward regression). 
The decision to add or drop a variable is usually made on the basis of the 
contribution of that variable to the ESS, as judged by the F test. Knowing what you 
do now about multicollinearity, would you recommend either procedure? Why or 
why not?* 

10.12. State with reason whether the following statements are true, false, or uncertain: 

a. Despite perfect multicollinearity, OLS estimators are BLUE. 

b. In cases of high multicollinearity, it is not possible to assess the individual sig¬ 
nificance of one or more partial regression coefficients. 

c. If an auxiliary regression shows that a particular R 2 is high, there is definite 
evidence of high collinearity. 

d. High pair-wise correlations do not suggest that there is high multicollinearity. 

e. Multicollinearity is harmless if the objective of the analysis is prediction only. 

f. Ceteris paribus, the higher the VIF is, the larger the variances of OLS estimators. 

g. The tolerance (TOL) is a better measure of multicollinearity than the VIF. 

h. You will not obtain a high R 2 value in a multiple regression if all the partial slope 
coefficients are individually statistically insignificant on the basis of the usual 
t test. 

i. In the regression of Y on Xj and X 3 , suppose there is little variability in the val¬ 
ues of X 3 . This would increase var(/b). In the extreme, if all X 3 are identical, 
var(/T;) is infinite. 

10.13. a. Show that if n, = 0 for i = 2, 3,..., k then 

# I.23...A = 0 

b. What is the importance of this finding for the regression of variable X\(= Y) on 

W 2 ,W 3 , ...,Xp 

10.14. Suppose all the zero-order correlation coefficients of X\{— Y), X 2 ,. . . ,X k are 
equal to r. 

a. What is the value of R 2 23 3 

b. What are the values of the first-order correlation coefficients? 

**10.15. In matrix notation it can he shown (see Appendix C) that 

P = (X'X) _1 X'y 

a. What happens to fi when there is perfect collinearity among the X’s? 

b. How would you know if perfect collinearity exists? 

*See if your reasoning agrees with that of Arthur S. Goldberger and D. B. jochems, "Note on Step¬ 
wise Least-Squares," journal of the American Statistical Association, vol. 56, March 1961, pp. 105-110. 
"Optional. 
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10.16. Using matrix notation, it can be shown 

var-cov (P) = cr 2 (X'X) -1 


What happens to this var-cov matrix: 


a. When there is perfect multicollinearity? 

b. When collinearity is high but not perfect? 


10.17. Consider the following correlation matrix 


X 2 X 3 
x 2 r 1 r 23 

R= X 3 r 32 1 


rik 

r 3k 


Xk L r k2 r k3 


1 J 


Describe how you would find out from the correlation matrix whether (a) there is 
perfect collinearity, (b) there is less than perfect collinearity, and (c) the W’s are 
uncorrelated. 

Hint: You may use |R| to answer these questions, where |R| denotes the deter¬ 
minant of R. 

*10.18. Orthogonal explanatory variables. Suppose in the model 

Yi=Pi+ fhXit + P 3 X 3i + ■ ■ ■ + p k X ki + Ui 
X 2 to X k are all uncorrelated. Such variables are called orthogonal variables. If this 
is the case: 

a. What will be the structure of the (X'X) matrix? 

b. How would you obtain p = (X'X) _1 X'y? 

c. What will be the nature of the var-cov matrix of p? 

d. Suppose you have run the regression and afterward you want to introduce an¬ 
other orthogonal variable, say, X k+ \ into the model. Do you have to recompute 
all the previous coefficients to fi k 7 Why or why not? 

10.19. Consider the following model: 

GNP, = )8i + p 2 M t + 0 3 M,_! + 0 4 (M, - M,_i) + ut 

where GNP, = GNP at time t, M, = money supply at time t, M<_i = money supply 
at time (t — 1), and (M, — M,_i) = change in the money supply between time t and 
time (t — 1). This model thus postulates that the level of GNP at time t is a function 
of the money supply at time t and time ( t — 1) as well as the change in the money 
supply between these time periods. 

a. Assuming you have the data to estimate the preceding model, would you succeed 
in estimating all the coefficients of this model? Why or why not? 

b. If not, what coefficients can be estimated? 

c. Suppose that the /1 3 M ; _i terms were absent from the model. Would your answer 
to (a) be the same? 

d. Repeat (c), assuming that the terms 0 2 M , were absent from the model. 


'Optional. 
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10.20. Show that Eqs. (7.4.7) and (7.4.8) can also be expressed as 

S (Ej^ 2! )(£4) - G>*3i)(X>2,*3/) 

* 2 ~ (E4)(E4)(*-4) 

s (E yixn) (E 4) - (E yt*v) (E *a*3<) 

(E4)(E4)(i~4) 

where r 2 3 is the coefficient of correlation between X 2 and .ft. 

10.21. Using Eqs. (7.4.12) and (7.4.15), show that when there is perfect collinearity, the 
variances of ft and ft are infinite. 

10.22. Verify that the standard errors of the sums of the slope coefficients estimated from 
Eqs. (10.5.6) and (10.5.7) are, respectively, 0.1549 and 0.1825. (See Section 10.5.) 

10.23. For the ft variable regression model, it can be shown that the variance of the Mi 
(k = 2, 3,..., K) partial regression coefficient given in Eq. (7.5.6) can also be ex¬ 
pressed as* 



where a 2 — variance of Y, cr£ = variance of the Mi explanatory variable, R\ — R 2 
from the regression of on the remaining X variables, and R 2 — coefficient of 
determination from the multiple regression, that is, regression of Y on all the X 
variables. 

a. Other things the same, if of increases, what happens to var(ft)? What are the 
implications for the multicollinearity problem? 

b. What happens to the preceding formula when collinearity is perfect? 

c. True or false: “The variance of ft decreases as R 2 rises, so that the effect of a 
high R\ can be offset by a high R 2 .” 

10.24. From the annual data for the U.S. manufacturing sector for 1899-1922, Dougherty 
obtained the following regression results:^ 

log7 = 2.81 - 0.531og^+ 0.91 logZ + 0.047f 

se = (1.38) (0.34) (0.14) (0.021) (1) 

R 2 = 0.97 F= 189.8 

where Y — index of real output, K — index of real capital input, L — index of real 
labor input, t — time or trend. 

Using the same data, he also obtained the following regression: 

logOVE) = —0.11 + 0.11 log (K/L)+ 0.006? 

se = (0.03) (0.15) (0.006) (2) 

R 2 = 0.65 F= 19.5 


'This formula is given by R. Stone, "The Analysis of Market Demand," journal of the Royal Statistical 
Society, vol. B7, 1945, p. 297. Also recall Eq. (7.5.6). For further discussion, see Peter Kennedy, A 
Guide to Econometrics, 2d ed., The MIT Press, Cambridge, Mass., 1985, p. 156. 

^Christopher Dougherty, Introduction to Econometrics, Oxford University Press, New York, 1992, 
pp. 159-160. 
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a. Is there multicollinearity in regression (1)? How do you know? 

b. In regression (1), what is the a priori sign of log K7 Do the results conform to this 
expectation? Why or why not? 

c. How would you justify the functional form of regression (1)? {Hint: Cobb- 
Douglas production function.) 

d. Interpret regression (1). What is the role of the trend variable in this regression? 

e. What is the logic behind estimating regression (2)? 

f. If there was multicollinearity in regression (1), has that been reduced by regres¬ 
sion (2)? How do you know? 

g. If regression (2) is a restricted version of regression (1), what restriction is 
imposed by the author? (Hint: returns to scale.) How do you know if this 
restriction is valid? Which test do you use? Show all your calculations. 

h. Are the R 2 values of the two regressions comparable? Why or why not? How 
would you make them comparable, if they are not comparable in the present 
form? 

10.25. Critically evaluate the following statements: 

a. “In fact, multicollinearity is not a modeling error. It is a condition of deficient 
data.”* 

b. “If it is not feasible to obtain more data, then one must accept the fact that the 
data one has contain a limited amount of information and must simplify the 
model accordingly. Trying to estimate models that are too complicated is one of 
the most common mistakes among inexperienced applied econometricians.”** 

c. “It is common for researchers to claim that multicollinearity is at work whenever 
their hypothesized signs are not found in the regression results, when variables 
that they know a priori to be important have insignificant t values, or when var¬ 
ious regression results are changed substantively whenever an explanatory vari¬ 
able is deleted. Unfortunately, none of these conditions is either necessary or 
sufficient for the existence of collinearity, and furthermore none provides any 
useful suggestions as to what kind of extra information might be required to 
solve the estimation problem they present.” 1 ' 

d. “... any time series regression containing more than four independent variables 
results in garbage.” 11 

Empirical Exercises 

10.26. Klein and Goldberger attempted to fit the following regression model to the U.S. 

economy: 

%~Pi + biXn + p 3 X 3i + p 4 X 4i + Ui 

where Y — consumption, X 2 = wage income, X 3 = nonwage, nonfarm income, and 

X 4 — farm income. But since X 2 , X 3 , and X 4 are expected to be highly collinear, 

they obtained estimates of p 3 and p 4 from cross-sectional analysis as follows: 

'Samprit Chatterjee, Ali S. Hadi, and Bertram Price, Regression Analysis by Example, 3d ed., John Wiley 

& Sons, New York, 2000, p. 226. 

“Russel Davidson and James C. MacKinnon, Estimation and Inference in Econometrics, Oxford Univer¬ 
sity Press, New York, 1993, p. 186. 

tPeter Kennedy, A Guide to Econometrics, 4th ed., MIT Press, Cambridge, Mass., 1998, p. 187. 

♦This quote attributed to the late econometrician Zvi Criliches, is obtained from Ernst R. Berndt, The 

Practice of Econometrics: Classic and Contemporary, Addison Wesley, Reading, Mass., 1991, p. 224. 
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TABLE 10.12 

Source: L. R. Klein and A. S. 
Goldberger, An Economic 
Model of the United States, 
1929-1952, North Holland 
Publishing Company, 


Year 

Y 

X 2 

*3 

*4 

Year 

Y 

x 2 

x 3 

x 4 

1936 

62.8 

43.41 

17.10 

3.96 

1946 

95.7 

76.73 

28.26 

9.76 

1937 

65.0 

46.44 

18.65 

5.48 

1947 

98.3 

75.91 

27.91 

9.31 

1938 

63.9 

44.35 

17.09 

4.37 

1948 

100.3 

77.62 

32.30 

9.85 

1939 

67.5 

47.82 

19.28 

4.51 

1949 

103.2 

78.01 

31.39 

7.21 

1940 

71.3 

51.02 

23.24 

4.88 

1950 

108.9 

83.57 

35.61 

7.39 

1941 

76.6 

58.71 

28.11 

6.37 

1951 

108.5 

90.59 

37.58 

7.98 

1945* 

86.3 

87.69 

30.29 

8.96 

1952 

111.4 

95.47 

35.17 

7.42 

♦The data 1 

hr the war j 

rears 1942- 

1944 are missin 

ig. The data foi 

r other years an 

; billions of 1939 dollars. 




Ps = 0.75/32 and Pa = 0.625/62• Using these estimates, they reformulated their 
consumption function as follows: 

Yi =Pi+ p 2 {X 2i + 0.75X3, + 0.625X4,) + u , =fii+ p 2 Zj + Ui 
where Z, = X 2i + 0.75X 3i + 0.625 X 4i . 

a. Fit the modified model to the data in Table 10.12 and obtain estimates of P\ to Pa- 

b. How would you interpret the variable Z? 

10.27. Table 10.13 gives data on imports, GDP, and the Consumer Price Index (CPI) for 
the United States over the period 1975-2005. You are asked to consider the follow¬ 
ing model: 

In Imports, = P\ + p 2 In GDP, + p 3 In CPI, + u, 

a. Estimate the parameters of this model using the data given in the table. 

b. Do you suspect that there is multicollinearity in the data? 

c. Regress: (1) In Imports, = A\ + A 2 In GDP, 

(2) In Imports, — B\+ B 2 In CPI, 

(3) In GDP, = Ci + C 2 In CPI, 

On the basis of these regressions, what can you say about the nature of mul¬ 
ticollinearity in the data? 


TABLE 10.13 

U.S. Imports, GDP, 

Year 

CPI 

GDP 

Imports 

Year 

CPI 

GDP 

Imports 

and CPI, 1975-2005 

1975 

53.8 

1,638.3 

98185 

1991 

136.2 

5,995.9 

491020 

(For all urban 

1976 

56.9 

1,825.3 

124228 

1992 

140.3 

6,337.7 

536528 

consumers; 1982-84 = 

1977 

60.6 

2,030.9 

151907 

1993 

144.5 

6,657.4 

589394 

100, except as noted) 

1978 

65.2 

2,294.7 

176002 

1994 

148.2 

7,072.2 

668690 

1979 

72.6 

2,563.3 

212007 

1995 

152.4 

7,397.7 

749374 

Bureau of Labor Statistics. 

1980 

82.4 

2,789.5 

249750 

1996 

156.9 

7,816.9 

803113 


1981 

90.9 

3,128.4 

265067 

1997 

160.5 

8,304.3 

876470 


1982 

96.5 

3,225.0 

247642 

1998 

163.0 

8,747.0 

917103 


1983 

99.6 

3,536.7 

268901 

1999 

166.6 

9,268.4 

1029980 


1984 

103.9 

3,933.2 

332418 

2000 

172.2 

9,817.0 

1224408 


1985 

107.6 

4,220.3 

338088 

2001 

177.1 

10,128.0 

1145900 


1986 

109.6 

4,462.8 

368425 

2002 

179.9 

10,469.6 

1164720 


1987 

113.6 

4,739.5 

409765 

2003 

184.0 

10,960.8 

1260717 


1988 

118.3 

5,103.8 

447189 

2004 

188.9 

11,712.5 

1472926 


1989 

1990 

124.0 

130.7 

5,484.4 

5,803.1 

477665 

498438 

2005 

195.3 

12,455.8 

1677371 
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d. Suppose there is multicollinearity in the data but #2 and Pz are individually sig¬ 
nificant at the 5 percent level and the overall F test is also significant. In this case 
should we worry about the collinearity problem? 

10.28. Refer to Exercise 7.19 about the demand function for chicken in the United States. 

a. Using the log-linear, or double-log, model, estimate the various auxiliary re¬ 
gressions. How many are there? 

b. From these auxiliary regressions, how do you decide which regressor(s) is 
highly collinear? Which test do you use? Show the details of your calculations. 

c. If there is significant collinearity in the data, which variable(s) would you drop 
to reduce the severity of the collinearity problem? If you do that, what econo¬ 
metric problems do you face? 

d. Do you have any suggestions, other than dropping variables, to ameliorate the 
collinearity problem? Explain. 

10.29. Table 10.14 gives data on new passenger cars sold in the United States as a function 

of several variables. 

a. Develop a suitable linear or log-linear model to estimate a demand function for 
automobiles in the United States. 

b. If you decide to include all the regressors given in the table as explanatory vari¬ 
ables, do you expect to face the multicollinearity problem? Why? 

c. If you do expect to face the multicollinearity problem, how will you go about 
resolving the problem? State your assumptions clearly and show all the calcula¬ 
tions explicitly. 

10.30. To assess the feasibility of a guaranteed annual wage (negative income tax), the 

Rand Corporation conducted a study to assess the response of labor supply (average 


TABLE 10.14 

Passenger Car Data 

1986, A Supplement to the 
Current Survey of Business, 


Year 

Y 

*2 

*3 

*4 

*5 

*6 

1971 

10,227 

112.0 

121.3 

776.8 

4.89 

79,367 

1972 

10,872 

111.0 

125.3 

839.6 

4.55 

82,153 

1973 

11,350 

111.1 

133.1 

949.8 

7.38 

85,064 

1974 

8,775 

117.5 

147.7 

1,038.4 

8.61 

86,794 

1975 

8,539 

127.6 

161.2 

1,142.8 

6.16 

85,846 

1976 

9,994 

135.7 

170.5 

1,252.6 

5.22 

88,752 

1977 

11,046 

142.9 

181.5 

1,379.3 

5.50 

92,01 7 

1978 

11,164 

153.8 

195.3 

1,551.2 

7.78 

96,048 

1979 

10,559 

166.0 

217.7 

1,729.3 

10.25 

98,824 

1980 

8,979 

179.3 

247.0 

1,918.0 

11.28 

99,303 

1981 

8,535 

190.2 

272.3 

2,127.6 

13.73 

100,397 

1982 

7,980 

197.6 

286.6 

2,261.4 

11.20 

99,526 

1983 

9,179 

202.6 

297.4 

2,428.1 

8.69 

100,834 

1984 

10,394 

208.5 

307.6 

2,670.6 

9.65 

105,005 

1985 

11,039 

215.2 

318.5 

2,841.1 

7.75 

107,150 

1986 

11,450 

224.4 

323.4 

3,022.1 

6.31 

109,597 


Y = new passenger cars sold (thousands), seasonally unadjusted. 

X 2 = new cars, Consumer Price Index, 1967 = 100, seasonally unadjusted. 

X 3 = Consumer Price Index, all items, all urban consumers, 1967 = 100, seasonally unadjusted. 
X4 = the personal disposable income (PDI), billions of dollars, unadjusted for seasonal variation. 
X5 = the interest rate, percent, finance company paper placed directly. 

Xg = the employed civilian labor force (thousands), unadjusted for seasonal variation. 
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hours of work) to increasing hourly wages.* The data for this study were 
drawn from a national sample of 6,000 households with a male head earning less 
than $15,000 annually. The data were divided into 39 demographic groups for 
analysis. These data are given in Table 10.15. Because data for four demographic 
groups were missing for some variables, the data given in the table refer to only 
35 demographic groups. The definitions of the various variables used in the analy¬ 
sis are given at the end of the table. 


TABLE 10.15 
Hours of Work and 
Other Data for 
35 Groups 

Source: D. H. Greenberg and 
M. Kosters, Income Guarantees 
and the Working Poor, Rand 
Corporation, R-579-OEO, 


Observation 

Hours 

Rate 

ERSP 

ERNO 

NEIN 

Assets 

Age 

DEP 

School 

1 

2157 

2.905 

1121 

291 

380 

7250 

38.5 

2.340 

10.5 

2 

2174 

2.970 

1128 

301 

398 

7744 

39.3 

2.335 

10.5 

3 

2062 

2.350 

1214 

326 

185 

3068 

40.1 

2.851 

8.9 

4 

2111 

2.511 

1203 

49 

117 

1632 

22.4 

1.159 

11.5 

5 

2134 

2.791 

1013 

594 

730 

12710 

57.7 

1.229 

8.8 

6 

2185 

3.040 

1135 

287 

382 

7706 

38.6 

2.602 

10.7 

7 

2210 

3.222 

1100 

295 

474 

9338 

39.0 

2.187 

11.2 

8 

2105 

2.493 

1180 

310 

255 

4730 

39.9 

2.616 

9.3 

9 

2267 

2.838 

1298 

252 

431 

8317 

38.9 

2.024 

11.1 

10 

2205 

2.356 

885 

264 

373 

6789 

38.8 

2.662 

9.5 

11 

2121 

2.922 

1251 

328 

312 

5907 

39.8 

2.287 

10.3 

12 

2109 

2.499 

1207 

347 

271 

5069 

39.7 

3.193 

8.9 

13 

2108 

2.796 

1036 

300 

259 

4614 

38.2 

2.040 

9.2 

14 

2047 

2.453 

1213 

297 

139 

1987 

40.3 

2.545 

9.1 

15 

2174 

3.582 

1141 

414 

498 

10239 

40.0 

2.064 

11.7 

16 

2067 

2.909 

1805 

290 

239 

4439 

39.1 

2.301 

10.5 

17 

2159 

2.511 

1075 

289 

308 

5621 

39.3 

2.486 

9.5 

18 

2257 

2.516 

1093 

176 

392 

7293 

37.9 

2.042 

10.1 

19 

1985 

1.423 

553 

381 

146 

1866 

40.6 

3.833 

6.6 

20 

2184 

3.636 

1091 

291 

560 

11240 

39.1 

2.328 

11.6 

21 

2084 

2.983 

1327 

331 

296 

5653 

39.8 

2.208 

10.2 

22 

2051 

2.573 

1194 

279 

172 

2806 

40.0 

2.362 

9.1 

23 

2127 

3.262 

1226 

314 

408 

8042 

39.5 

2.259 

10.8 

24 

2102 

3.234 

1188 

414 

352 

7557 

39.8 

2.019 

10.7 

25 

2098 

2.280 

973 

364 

272 

4400 

40.6 

2.661 

8.4 

26 

2042 

2.304 

1085 

328 

140 

1739 

41.8 

2.444 

8.2 

27 

2181 

2.912 

1072 

304 

383 

7340 

39.0 

2.337 

10.2 

28 

2186 

3.015 

1122 

30 

352 

7292 

37.2 

2.046 

10.9 

29 

2188 

3.010 

990 

366 

374 

7325 

38.4 

2.847 

10.6 

30 

2077 

1.901 

350 

209 

95 

1370 

37.4 

4.158 

8.2 

31 

2196 

3.009 

947 

294 

342 

6888 

37.5 

3.047 

10.6 

32 

2093 

1.899 

342 

311 

120 

1425 

37.5 

4.512 

8.1 

33 

2173 

2.959 

1116 

296 

387 

7625 

39.2 

2.342 

10.5 

34 

2179 

2.971 

1128 

312 

397 

7779 

39.4 

2.341 

10.5 

35 

2200 

2.980 

1126 

204 

393 

7885 

39.2 

2.341 

10.6 


Notes: Hours = average hours worked during the year. 

Rate = average hourly wage (dollars). 

ERSP = average yearly earnings of spouse (dollars). 

ERNO = average yearly earnings of other family members (dollars). 
NEIN = average yearly noneamed income. 

Assets = average family asset holdings (bank account, etc.) (dollars). 
Age = average age of respondent. 

Dep = average number of dependents. 

School = average highest grade of school completed. 


*D. H. Greenberg and M. Kosters, Income Guarantees and the Working Poor, Rand Corporation, R-579- 
OEO, December 1970. 
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a. Regress average hours worked during the year on the variables given in the table 
and interpret your regression. 

b. Is there evidence of multicollinearity in the data? How do you know? 

c. Compute the variance inflation factors (VIF) and TOL measures for the various 
regressors. 

d. If there is the multicollinearity problem, what remedial action, if any, would 
you take? 

e. What does this study tell about the feasibility of a negative income tax? 

10.31. Table 10.16 gives data on the crime rate in 47 states in the United States for 1960. 
Try to develop a suitable model to explain the crime rate in relation to the 14 
socioeconomic variables given in the table. Pay particular attention to the collinearity 
problem in developing your model. 

10.32. Refer to the Longley data given in Section 10.10. Repeat the regression given in the 
table there by omitting the data for 1962; that is, run the regression for the period 
1947-1961. Compare the two regressions. What general conclusion can you draw 
from this exercise? 

10.33. Updated Longley data. We have extended the data given in Section 10.10 to include 
observations from 1959-2005. The new data are in Table 10.17. The data pertain to 
Y — number of people employed, in thousands; X\ — GNP implicit price deflator; 
X% — GNP, millions of dollars; A3 = number of people unemployed in thousands; 
X4 = number of people in the armed forces in thousands; A5 = noninstitutionalized 
population over 16 years of age; and Xg = year, equal to 1 in 1959, 2 in 1960, and 
47 in 2005. 

a. Create scatterplots as suggested in the chapter to assess the relationships 
between the independent variables. Are there any strong relationships? Do they 
seem linear? 

b. Create a correlation matrix. Which variables seem to be the most related to each 
other, not including the dependent variable? 

c. Run a standard OLS regression to predict the number of people employed in 
thousands. Do the coefficients on the independent variables behave as you would 
expect? 

d. Based on the above results, do you believe these data suffer from multicollinearity? 
*10.34. As cheese ages, several chemical processes take place that determine the taste of the 

final product. The data given in Table 10.18 pertain to concentrations of various 
chemicals in a sample of 30 mature cheddar cheeses and subjective measures of 
taste for each sample. The variables acetic and H 2 S are the natural logarithm of con¬ 
centration of acetic acid and hydrogen sulfide, respectively. The variable lactic has 
not been log-transformed. 

a. Draw a scatterplot of the four variables. 

b. Perform a bivariate regression of taste on acetic and H 2 S and interpret your results. 

c. Perform a bivariate regression of taste on lactic and H 2 S, and interpret the results. 

d. Perform a multiple regression of taste on acetic, H 2 S, and lactic. Interpret your re¬ 
sults. 

e. Knowing what you know about multicollinearity, how would you decide among 
these regressions? 

f. What overall conclusions can you draw from your analysis? 


‘Optional. 
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TABLE 10.16 U.S. Crime Data for 47 States in 1960 


Observation 

R 

Age 

5 

ED 

EX 0 

EXi 

LF 

M 

N 

NW 

U 1 

u 2 

w 

X 

1 

79.1 

151 

1 

91 

58 

56 

510 

950 

33 

301 

108 

41 

394 

261 

2 

163.5 

143 

0 

113 

103 

95 

583 

1012 

13 

102 

96 

36 

557 

194 

3 

57.8 

142 

1 

89 

45 

44 

533 

969 

18 

219 

94 

33 

318 

250 

4 

196.9 

136 

0 

121 

149 

141 

577 

994 

157 

80 

102 

39 

673 

167 

5 

123.4 

141 

0 

121 

109 

101 

591 

985 

18 

30 

91 

20 

578 

174 

6 

68.2 

121 

0 

no 

118 

115 

547 

964 

25 

44 

84 

29 

689 

126 

7 

96.3 

127 

1 

in 

82 

79 

519 

982 

4 

139 

97 

38 

620 

168 

8 

155.5 

131 

1 

109 

115 

109 

542 

969 

50 

179 

79 

35 

472 

206 

9 

85.6 

157 

1 

90 

65 

62 

553 

955 

39 

286 

81 

28 

421 

239 

10 

70.5 

140 

0 

118 

71 

68 

632 

1029 

7 

15 

100 

24 

526 

174 

11 

167.4 

124 

0 

105 

121 

116 

580 

966 

101 

106 

77 

35 

657 

170 

12 

84.9 

134 

0 

108 

75 

71 

595 

972 

47 

59 

83 

31 

580 

172 

13 

51.1 

128 

0 

113 

67 

60 

624 

972 

28 

10 

77 

25 

507 

206 

14 

66.4 

135 

0 

117 

62 

61 

595 

986 

22 

46 

77 

27 

529 

190 

15 

79.8 

152 

1 

87 

57 

53 

530 

986 

30 

72 

92 

43 

405 

264 

16 

94.6 

142 

1 

88 

81 

77 

497 

956 

33 

321 

116 

47 

427 

247 

17 

53.9 

143 

0 

110 

66 

63 

537 

977 

10 

6 

114 

35 

487 

166 

18 

92.9 

135 

1 

104 

123 

115 

537 

978 

31 

170 

89 

34 

631 

165 

19 

75.0 

130 

0 

116 

128 

128 

536 

934 

51 

24 

78 

34 

627 

135 

20 

122.5 

125 

0 

108 

113 

105 

567 

985 

78 

94 

130 

58 

626 

166 

21 

74.2 

126 

0 

108 

74 

67 

602 

984 

34 

12 

102 

33 

557 

195 

22 

43.9 

157 

1 

89 

47 

44 

512 

962 

22 

423 

97 

34 

288 

276 

23 

121.6 

132 

0 

96 

87 

83 

564 

953 

43 

92 

83 

32 

513 

227 

24 

96.8 

131 

0 

116 

78 

73 

574 

1038 

7 

36 

142 

42 

540 

176 

25 

52.3 

130 

0 

116 

63 

57 

641 

984 

14 

26 

70 

21 

486 

196 

26 

199.3 

131 

0 

121 

160 

143 

631 

1071 

3 

77 

102 

41 

674 

152 

27 

34.2 

135 

0 

109 

69 

71 

540 

965 

6 

4 

80 

22 

564 

139 

28 

121.6 

152 

0 

112 

82 

76 

571 

1018 

10 

79 

103 

28 

537 

215 

29 

104.3 

119 

0 

107 

166 

157 

521 

938 

168 

89 

92 

36 

637 

154 

30 

69.6 

166 

1 

89 

58 

54 

521 

973 

46 

254 

72 

26 

396 

237 

31 

37.3 

140 

0 

93 

55 

54 

535 

1045 

6 

20 

135 

40 

453 

200 

32 

75.4 

125 

0 

109 

90 

81 

586 

964 

97 

82 

105 

43 

617 

163 

33 

107.2 

147 

1 

104 

63 

64 

560 

972 

23 

95 

76 

24 

462 

233 

34 

92.3 

126 

0 

118 

97 

97 

542 

990 

18 

21 

102 

35 

589 

166 

35 

65.3 

123 

0 

102 

97 

87 

526 

948 

113 

76 

124 

50 

572 

158 

36 

127.2 

150 

0 

100 

109 

98 

531 

964 

9 

24 

87 

38 

559 

153 

37 

83.1 

177 

1 

87 

58 

56 

638 

974 

24 

349 

76 

28 

382 

254 

38 

56.6 

133 

0 

104 

51 

47 

599 

1024 

7 

40 

99 

27 

425 

225 

39 

82.6 

149 

1 

88 

61 

54 

515 

953 

36 

165 

86 

35 

395 

251 

40 

115.1 

145 

1 

104 

82 

74 

560 

981 

96 

126 

88 

31 

488 

228 

41 

88.0 

148 

0 

122 

72 

66 

601 

998 

9 

19 

84 

20 

590 

144 

42 

54.2 

141 

0 

109 

56 

54 

523 

968 

4 

2 

107 

37 

489 

170 

43 

82.3 

162 

1 

99 

75 

70 

522 

996 

40 

208 

73 

27 

496 

224 

44 

103.0 

136 

0 

121 

95 

96 

574 

1012 

29 

36 

111 

37 

622 

162 

45 

45.5 

139 

1 

88 

46 

41 

480 

968 

19 

49 

135 

53 

457 

249 

46 

50.8 

126 

0 

104 

106 

97 

599 

989 

40 

24 

78 

25 

593 

171 

47 

84.9 

130 

0 

121 

90 

91 

623 

1049 

3 

22 

113 

40 

588 

160 

Source: W. Vandaele, “Participatioi 

1 in Illegitimate Activi 

ties: Erlich Revisted,” 

in A. Blum 

stein, J. Cc 

)hen, and D. N« 

agin, eds., Deterrent 

? and Incapa, 


1, National 





Definitions 


Obser 


of variables: 

Age = number of males of age 14-24 per 1,000 population. 

S = indicator variable for southern states (0 = no, 1 = yes). 

ED = mean number of years of schooling times 10 for persons age 25 or older. 

EXq = 1960 per capita expenditure on police by state and local government. 

EXi = 1959 per capita expenditure on police by state and local government. 

LF = labor force participation rate per 1,000 civilian urban males age 14-24. 

M = number of males per 1,000 females. 

N = state population size in hundred thousands. 

NW = number of nonwhites per 1,000 population. 

U\ = unemployment rate of urban males per 1,000 of age 14-24. 

C/2 = unemployment rate of urban males per 1,000 of age 35-39. 

W = median value of transferable goods and assets or family income in tens of dollars. 
X = the number of families per 1,000 earnings / the median income, 
ration = state (47 states for the year 1960). 
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TABLE 10.17 
Updated Longley 
Data, 1959-2005 


Source: Department of Labor, 

http://siadapp.dmdc.osd.mil/ 
personnel/MILITARY/Miltop. 


Observation 

1959 

1960 

1961 

1962 

1963 

1964 

1965 

1966 

1967 

1968 

1969 

1970 

1971 

1972 

1973 

1974 

1975 

1976 

1977 

1978 

1979 

1980 

1981 

1982 

1983 

1984 

1985 

1986 

1987 

1988 

1989 

1990 

1991 

1992 

1993 

1994 

1995 

1996 

1997 

1998 

1999 

2000 
2001 
2002 

2003 

2004 

2005 



Xi 

x 2 

x 3 

X 4 

X s 

64,630 

82.908 

509,300 

3,740 

2552 

120,287 

65,778 

84.074 

529,500 

3,852 

2514 

121,836 

65,746 

85.015 

548,200 

4,714 

2573 

123,404 

66,702 

86.186 

589,700 

3,911 

2827 

124,864 

67,762 

87.103 

622,200 

4,070 

2737 

127,274 

69,305 

88.438 

668,500 

3,786 

2738 

129,427 

71,088 

90.055 

724,400 

3,366 

2722 

131,541 

72,895 

92.624 

792,900 

2,875 

3123 

133,650 

74,372 

95.491 

838,000 

2,975 

3446 

135,905 

75,920 

99.56 

916,100 

2,817 

3535 

138,171 

77,902 

104.504 

990,700 

2,832 

3506 

140,461 

78,678 

110.046 

1,044,900 

4,093 

3188 

143,070 

79,367 

115.549 

1,134,700 

5,016 

2816 

145,826 

82,153 

120.556 

1,246,800 

4,882 

2449 

148,592 

85,064 

127.307 

1,395,300 

4,365 

2327 

151,476 

86,794 

138.82 

1,515,500 

5,156 

2229 

154,378 

85,846 

151.857 

1,651,300 

7,929 

2180 

157,344 

88,752 

160.68 

1,842,100 

7,406 

2144 

160,319 

92,017 

1 70.884 

2,051,200 

6,991 

2133 

163,377 

96,048 

182.863 

2,316,300 

6,202 

2117 

166,422 

98,824 

198.077 

2,595,300 

6,137 

2088 

169,440 

99,303 

216.073 

2,823,700 

7,637 

2102 

1 72,437 

100,397 

236.385 

3,161,400 

8,273 

2142 

174,929 

99,526 

250.798 

3,291,500 

10,678 

2179 

177,176 

100,834 

260.68 

3,573,800 

10,717 

2199 

1 79,234 

105,005 

270.496 

3,969,500 

8,539 

2219 

181,192 

107,150 

278.759 

4,246,800 

8,312 

2234 

183,174 

109,597 

284.895 

4,480,600 

8,237 

2244 

185,284 

112,440 

292.691 

4,757,400 

7,425 

2257 

187,419 

114,968 

302.68 

5,127,400 

6,701 

2224 

189,233 

117,342 

314.179 

5,510,600 

6,528 

2208 

190,862 

118,793 

326.357 

5,837,900 

7,047 

2167 

192,644 

117,718 

337.747 

6,026,300 

8,628 

2118 

194,936 

118,492 

345.477 

6,367,400 

9,613 

1966 

197,205 

120,259 

353.516 

6,689,300 

8,940 

1760 

199,622 

123,060 

361.026 

7,098,400 

7,996 

1673 

201,970 

124,900 

368.444 

7,433,400 

7,404 

1579 

204,420 

126,708 

375.429 

7,851,900 

7,236 

1502 

207,087 

129,558 

381.663 

8,337,300 

6,739 

1457 

209,846 

131,463 

385.881 

8,768,300 

6,210 

1423 

212,638 

133,488 

391.452 

9,302,200 

5,880 

1380 

215,404 

136,891 

399.986 

9,855,900 

5,692 

1405 

218,061 

136,933 

409.582 

10,171,600 

6,801 

1412 

220,800 

136,485 

416.704 

10,500,200 

8,378 

1425 

223,532 

137,736 

425.553 

11,01 7,600 

8,774 

1423 

226,223 

139,252 

437.795 

11,762,100 

8,149 

1411 

228,892 

141.730 

451.946 

12.502.400 

7.591 

1378 

231.552 


X 6 

1 

2 

3 

4 

5 

6 

7 

8 
9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 
21 
22 

23 

24 

25 

26 

27 

28 

29 

30 

31 

32 

33 

34 

35 

36 

37 

38 

39 

40 

41 

42 

43 

44 

45 

46 

47 
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TABLE 10.18 

Chemicals in Cheeses 

Source: http://Ub.stat.cmu.edu/ 
DASL/Datafiles/Cheese.html. 


Obs. 

Taste 

Acetic 

h 2 s 

Lactic 

1 

12.30000 

4.543000 

3.135000 

0.860000 

2 

20.90000 

5.159000 

5.043000 

1.530000 

3 

39.00000 

5.366000 

5.438000 

1.570000 

4 

47.90000 

5.759000 

7.496000 

1.810000 

5 

5.600000 

4.663000 

3.807000 

0.990000 

6 

25.90000 

5.697000 

7.601000 

1.090000 

7 

37.30000 

5.892000 

8.726000 

1.290000 

8 

21.90000 

6.078000 

7.966000 

1.780000 

9 

18.10000 

4.898000 

3.850000 

1.290000 

10 

21.00000 

5.242000 

4.1 74000 

1.580000 

11 

34.90000 

5.740000 

6.142000 

1.680000 

12 

57.20000 

6.446000 

7.908000 

1.900000 

13 

0.700000 

4.477000 

2.996000 

1.060000 

14 

25.90000 

5.236000 

4.942000 

1.300000 

15 

54.90000 

6.151000 

6.752000 

1.520000 

16 

40.90000 

3.365000 

9.588000 

1.740000 

17 

15.90000 

4.787000 

3.912000 

1.160000 

18 

6.400000 

5.142000 

4.700000 

1.490000 

19 

18.00000 

5.247000 

6.1 74000 

1.630000 

20 

38.90000 

5.438000 

9.064000 

1.990000 

21 

14.00000 

4.564000 

4.949000 

1.150000 

22 

15.20000 

5.298000 

5.220000 

1.330000 

23 

32.00000 

5.455000 

9.242000 

1.440000 

24 

56.70000 

5.855000 

10.19900 

2.010000 

25 

16.80000 

5.366000 

3.664000 

1.310000 

26 

11.60000 

6.043000 

3.219000 

1.460000 

27 

26.50000 

6.458000 

6.962000 

1.720000 

28 

0.700000 

5.328000 

3.912000 

1.250000 

29 

13.40000 

5.802000 

6.685000 

1.080000 

30 

5.500000 

6.1 76000 

4.787000 

1.250000 





Chapter 


Heteroscedasticity: 
What Happens If the 
Error Variance Is 
Nonconstant? 


An important assumption of the classical linear regression model (Assumption 4) is that 
the disturbances w, appearing in the population regression function are homoscedastic; that 
is, they all have the same variance. In this chapter we examine the validity of this assump¬ 
tion and find out what happens if this assumption is not fulfilled. As in Chapter 10, we seek 
answers to the following questions: 

1. What is the nature of heteroscedasticity? 

2. What are its consequences? 

3. How does one detect it? 

4. What are the remedial measures? 

11.1 The Nature of Heteroscedasticity 

As noted in Chapter 3, one of the important assumptions of the classical linear regression 
model is that the variance of each disturbance term u,, conditional on the chosen values 
of the explanatory variables, is some constant number equal to cr 2 . This is the assump¬ 
tion of homoscedasticity, or equal (homo) spread (scedasticity), that is, equal variance. 
Symbolically, 

E(u 2 ) = a 2 i = 1,2,... ,n (11.1.1) 

Diagrammatically, in the two-variable regression model homoscedasticity can be shown 
as in Figure 3.4, which, for convenience, is reproduced as Figure 11.1. As Figure 11.1 
shows, the conditional variance of 7, (which is equal to that of uf), conditional upon the 
given X t , remains the same regardless of the values taken by the variable X. 

In contrast, consider Figure 11.2, which shows that the conditional variance of 7, 
increases as X increases. Here, the variances of 7, are not the same. Hence, there is 
heteroscedasticity. Symbolically, 

E(u 2 )=o 2 (11.1.2) 
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FIGURE 11.1 

Homoscedastic 

disturbances. 


FIGURE 11.2 

Heteroscedastic 

disturbances. 




Notice the subscript of o 2 , which reminds us that the conditional variances of u, 
( = conditional variances of Y,) are no longer constant. 

To make the difference between homoscedasticity and heteroscedasticity clear, assume 
that in the two-variable model Y t = f}\ + fcXi + w,, Y represents savings and X represents 
income. Figures 11.1 and 11.2 show that as income increases, savings on the average also 
increase. But in Figure 11.1 the variance of savings remains the same at all levels of 
income, whereas in Figure 11.2 it increases with income. It seems that in Figure 11.2 the 
higher-income families on the average save more than the lower-income families, but there 
is also more variability in their savings. 

There are several reasons why the variances of u, may be variable, some of which are as 
follows. 1 

1. Following the error-learning models, as people learn, their errors of behavior become 
smaller over time or the number of errors becomes more consistent. In this case, of is 
expected to decrease. As an example, consider Figure 11.3, which relates the number of 
typing errors made in a given time period on a test to the hours put in typing practice. As 
Figure 11.3 shows, as the number of hours of typing practice increases, the average number 
of typing errors as well as their variances decreases. 

2. As incomes grow, people have more discretionary income 2 and hence more scope 
for choice about the disposition of their income. Hence, of is likely to increase with 

'See Stefan Valavanis, Econometrics, McGraw-Hill, New York, 1959, p. 48. 

2 As Valavanis puts it, "Income grows, and people now barely discern dollars whereas previously they 
discerned dimes," ibid., p. 48. 





FIGURE 11.3 

Illustration of 
heteroscedasticity. 
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A 

income. Thus in the regression of savings on income one is likely to find of increasing 
with income (as in Figure 11.2) because people have more choices about their savings be¬ 
havior. Similarly, companies with larger profits are generally expected to show greater 
variability in their dividend policies than companies with lower profits. Also, growth- 
oriented companies are likely to show more variability in their dividend payout ratio than 
established companies. 

3. As data collecting techniques improve, of is likely to decrease. Thus, banks that have 
sophisticated data processing equipment are likely to commit fewer errors in the monthly 
or quarterly statements of their customers than banks without such facilities. 

4. Heteroscedasticity can also arise as a result of the presence of outliers. An outlying 
observation, or outlier, is an observation that is much different (either very small or very 
large) in relation to the observations in the sample. More precisely, an outlier is an obser¬ 
vation from a different population to that generating the remaining sample observations. 3 
The inclusion or exclusion of such an observation, especially if the sample size is small, 
can substantially alter the results of regression analysis. 

As an example, consider the scattergram given in Figure 11.4. Based on the data given in 
Table 11.9 in Exercise 11.22, this figure plots percent rate of change of stock prices (7) and 
consumer prices ( X ) for the post-World War II period through 1969 for 20 countries. In this 
figure the observation on 7 and X for Chile can be regarded as an outlier because the given 7 
and X values are much larger than for the rest of the countries. In situations such as this, it 
would be hard to maintain the assumption of homoscedasticity. In Exercise 11.22, you are 
asked to find out what happens to the regression results if the observations for Chile are 
dropped from the analysis. 

5. Another source of heteroscedasticity arises from violating Assumption 9 of the classi¬ 
cal linear regression model (CLRM), namely, that the regression model is correctly specified. 
Although we will discuss the topic of specification errors more fully in Chapter 13, very often 
what looks like heteroscedasticity may be due to the fact that some important variables are 
omitted from the model. Thus, in the demand function for a commodity, if we do not include 
the prices of commodities complementary to or competing with the commodity in question 
(the omitted variable bias), the residuals obtained from the regression may give the distinct 
impression that the error variance may not be constant. But if the omitted variables are in¬ 
cluded in the model, that impression may disappear. 


indebted to Michael McAleer for pointing this out to me. 
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FIGURE 11.4 

The relationship 
between stock prices 
and consumer prices. 


10 

& 9 



© 



4 5 6 7 8 9 10 

Consumer prices (% change) 


26 


FIGURE 11.5 

Residuals from the 
regression of 

(a) impressions 
of advertising 
expenditure and 

( b ) impression on 
Adexp and Adexp 2 . 


As a concrete example, recall our study of advertising impressions retained (7) in rela¬ 
tion to advertising expenditure (X). (See Exercise 8.32.) If you regress Y on X only and ob¬ 
serve the residuals from this regression, you will see one pattern, but if you regress Eon X 
and X 2 , you will see another pattern, which can be seen clearly from Figure 11.5. We have 
already seen that X 2 belongs in the model. (See Exercise 8.32.) 

6. Another source of heteroscedasticity is skewness in the distribution of one or more 
regressors included in the model. Examples are economic variables such as income, 
wealth, and education. It is well known that the distribution of income and wealth in most 
societies is uneven, with the bulk of the income and wealth being owned by a few at the top. 

7. Other sources of heteroscedasticity: As David Hendry notes, heteroscedasticity can 
also arise because of (1) incorrect data transformation (e.g., ratio or first difference transfor¬ 
mations) and (2) incorrect functional form (e.g., linear versus log-linear models). 4 




(a) ( b ) 


4 David F. Hendry, Dynamic Econometrics, Oxford University Press, 1995, p. 45. 
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Note that the problem of heteroscedasticity is likely to be more common in cross- 
sectional than in time series data. In cross-sectional data, one usually deals with members of 
a population at a given point in time, such as individual consumers or their families, firms, 
industries, or geographical subdivisions such as state, country, city, etc. Moreover, these 
members may he of different sizes, such as small, medium, or large firms or low, medium, 
or high income. In time series data, on the other hand, the variables tend to be of similar or¬ 
ders of magnitude because one generally collects the data for the same entity over a period 
of time. Examples are gross national product (GNP), consumption expenditure, savings, or 
employment in the United States, say, for the period 1955-2005. 

As an illustration of heteroscedasticity likely to be encountered in cross-sectional analysis, 
consider Table 11.1. This table gives data on compensation per employee in 10 nondurable 
goods manufacturing industries, classified by the employment size of the firm or the estab¬ 
lishment for the year 1958. Also given in the table are average productivity figures for nine 
employment classes. 

Although the industries differ in their output composition, Table 11.1 shows clearly that 
on the average large firms pay more than small firms. As an example, firms employing one 
to four employees paid on the average about $3,396, whereas those employing 1,000 to 
2,499 employees on the average paid about $4,843. But notice that there is considerable 
variability in earnings among various employment classes as indicated by the estimated 


TABLE 11.1 Compensation per Employee ($) in Nondurable Manufacturing Industries According to Employment 
Size of Establishment, 1958 





Employment Size (average number of employees) 


Industry 

1-4 

5-9 

10-19 

20-49 

50-99 

100-249 

250-499 

500-999 

1,000-2,499 

Food and kindred 
products 

2,994 

3,295 

3,565 

3,907 

4,189 

4,486 

4,676 

4,968 

5,342 

Tobacco products 

1,721 

2,057 

3,336 

3,320 

2,980 

2,848 

3,072 

2,969 

3,822 

Textile mill 
products 

3,600 

3,657 

3,674 

3,437 

3,340 

3,334 

3,225 

3,163 

3,168 

Apparel and 
related products 

3,494 

3,787 

3,533 

3,215 

3,030 

2,834 

2,750 

2,967 

3,453 

Paper and allied 
products 

3,498 

3,847 

3,913 

4,135 

4,445 

4,885 

5,132 

5,342 

5,326 

Printing and 
publishing 

3,611 

4,206 

4,695 

5,083 

5,301 

5,269 

5,182 

5,395 

5,552 

Chemicals and 
allied products 

3,875 

4,660 

4,930 

5,005 

5,114 

5,248 

5,630 

5,870 

5,876 

Petroleum and 
coal products 

4,616 

5,181 

5,317 

5,337 

5,421 

5,710 

6,316 

6,455 

6,347 

Rubber and 
plastic products 

3,538 

3,984 

4,014 

4,287 

4,221 

4,539 

4,721 

4,905 

5,481 

Leather and 
leather products 

3,016 

3,196 

3,149 

3,317 

3,414 

3,254 

3,177 

3,346 

4,067 

Average 

compensation 

3,396 

3,787 

4,013 

4,104 

4,146 

4,241 

4,388 

4,538 

4,843 

Standard deviation 

742.2 

851.4 

727.8 

805.06 

929.9 

1,080.6 

1,241.2 

1,307.7 

1,110.7 

Average 

productivity 

9,355 

8,584 

7,962 

8,275 

8,389 

9,418 

9,795 

10,281 

11,750 


Source: The Census of Manufacturers, U.S. Department of Commerce, 1958 (computed by author). 
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FIGURE 11.6 

Standard deviation of 
compensation and 
mean compensation. 



Mean compensation 


standard deviations of earnings. This can be seen also from Figure 11.6, which plots the 
standard deviation of compensation and average compensation in each employment class. 
As can be seen clearly, on average, the standard deviation of compensation increases with 
the average value of compensation. 


11.2 OLS Estimation in the Presence of Heteroscedasticity 


What happens to ordinary least squares (OLS) estimators and their variances if we intro¬ 
duce heteroscedasticity by letting E(uj) = of but retain all other assumptions of the clas¬ 
sical model? To answer this question, let us revert to the two-variable model: 

Yi = + fhXi + u, 

Applying the usual formula, the OLS estimator of fa is 


Y.*iyt 

' E*, 2 


n E^ Y, ~ E *■' E>- 
«E*?-(E ^) 2 


( 11 . 2 . 1 ) 


but its variance is now given by the following expression (see Appendix 11 A, Section 11 A. 1): 


e*,v 

' (E^ 2 ) 2 


( 11 . 2 . 2 ) 


which is obviously different from the usual variance formula obtained under the assump¬ 
tion of homoscedasticity, namely, 


var(ft) = yy 


( 11 . 2 . 3 ) 
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Of course, if a 2 = a 2 for each i, the two formulas will be identical. (Why?) 

Recall that fe is best linear unbiased estimator (BLUE) if the assumptions of the classi¬ 
cal model, including homoscedasticity, hold. Is it still BLUE when we drop only the 
homoscedasticity assumption and replace it with the assumption of heteroscedasticity? It is 
easy to prove that j6 2 is still linear and unbiased. As a matter of fact, as shown in Appendix 
3 A, Section 3A.2, to establish the unbiasedness of /3 2 it is not necessary that the disturbances 
( u t ) be homoscedastic. In fact, the variance of homoscedastic or heteroscedastic, plays 
no part in the determination of the unbiasedness property. Recall that in Appendix 3 A, Sec¬ 
tion 3A.7, we showed that fc is a consistent estimator under the assumptions of the classical 
linear regression model. Although we will not prove it, it can be shown that j6 2 is a consistent 
estimator despite heteroscedasticity; that is, as the sample size increases indefinitely, 
the estimated /i 2 converges to its true value. Furthermore, it can also be shown that under 
certain conditions (called regularity conditions), /i 2 is asymptotically normally distributed. 
Of course, what we have said about /S 2 also holds true of other parameters of a multiple 
regression model. 

Granted that f>2 is still linear unbiased and consistent, is it “efficient” or “best”? That is, 
does it have minimum variance in the class of unbiased estimators? And is that minimum 
variance given by Eq. (11.2.2)? The answer is no to both the questions: j6 2 is no longer best 
and the minimum variance is not given by Eq. (11.2.2). Then what is BLUE in the presence 
of heteroscedasticity? The answer is given in the following section. 


11.3 The Method of Generalized Least Squares (GLS) 


Why is the usual OLS estimator of /f 2 given in Eq. (11.2.1) not best, although it is still unbi¬ 
ased? Intuitively, we can see the reason from Table 11.1. As the table shows, there is consid¬ 
erable variability in the earnings between employment classes. If we were to regress 
per-employee compensation on the size of employment, we would hke to make use of the 
knowledge that there is considerable interclass variability in earnings. Ideally, we would like 
to devise the estimating scheme in such a manner that observations coming from populations 
with greater variability are given less weight than those coming from populations with 
smaller variability. Examining Table 11.1, we would like to weight observations coming 
from employment classes 10-19 and 2Q-A9 more heavily than those coming from employ¬ 
ment classes like 5-9 and 250^499, for the former are more closely clustered around their 
mean values than the latter, thereby enabling us to estimate the population regression func¬ 
tion (PRF) more accurately. 

Unfortunately, the usual OLS method does not follow this strategy and therefore does 
not make use of the “information” contained in the unequal variability of the dependent 
variable Y, say, employee compensation of Table 11.1: It assigns equal weight or impor¬ 
tance to each observation. But a method of estimation, known as generalized least squares 
(GLS), takes such information into account explicitly and is therefore capable of produc¬ 
ing estimators that are BLUE. To see how this is accomplished, let us continue with the 
now-familiar two-variable model: 


Yi=h + p 2 Xi + u, (11.3.1) 

which for ease of algebraic manipulation we write as 

Y i =^Xm + fhX i + u i (11.3.2) 


where X 0i — 1 for each i. The reader can see that these two formulations are identical. 
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Now assume that the heteroscedastic variances of are known. Divide Eq. (11.3.2) 
through by cr, to obtain 

5 _*(£)+*(£) + (£) (11.3.3) 

which for ease of exposition we write as 

Y* = p*X* Qi + p*X* + u* (11.3.4) 

where the starred, or transformed, variables are the original variables divided by (the known) 
Of. We use the notation /J* and ft 2 , the parameters of the transformed model, to distinguish 
them from the usual OLS parameters p and fi 2 . 

What is the purpose of transforming the original model? To see this, notice the follow- 


since E(u*) = 0 

since of is known (11.3.5) 
sine e E(uf) = of 


ing feature of the transformed error term u*\ 
var (u*) = E(u‘f = E ffj 

= Aw) 


which is a constant. That is, the variance of the transformed disturbance term u* is now ho- 
moscedastic. Since we are still retaining the other assumptions of the classical model, the 
finding that it is u* that is homoscedastic suggests that if we apply OLS to the transformed 
model (11.3.3) it will produce estimators that are BLUE. In short, the estimated /J* and fi' 2 
are now BLUE and not the OLS estimators p and p- 

This procedure of transforming the original variables in such a way that the transformed 
variables satisfy the assumptions of the classical model and then applying OLS to them is 
known as the method of generalized least squares (GLS). In short, GLS is OLS on the trans¬ 
formed variables that satisfy the standard least-squares assumptions. The estimators thus 
obtained are known as GLS estimators, and it is these estimators that are BLUE. 

The actual mechanics of estimating p* and p 2 are as follows. First, we write down the 
sample regression function (SRF) of Eq. (11.3.3) 




or 

Y* = p*X* 0i + p*X* + u* (11.3.6) 

Now, to obtain the GLS estimators, we minimize 

= J2 ( y* - p*x* 0i - fcx*f 

that is, 




(11.3.7) 
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The actual mechanics of minimizing Eq. (11.3.7) follow the standard calculus techniques 
and are given in Appendix 11A, Section 11A.2. As shown there, the GLS estimator of fi* is 


2 M iillMM mi 


(11.3.8) 


and its variance is given by 


var(/i*) = 


_s>_ 

0 ) 0 -*?) 


(11.3.9) 


where w, = J/crf. 

Difference between OLS and GLS 

Recall from Chapter 3 that in OLS we minimize 

E «? = E^ - A - hXi? (11 -3.10) 

hut in GLS we minimize the expression (11.3.7), which can also be written as 

X to = E w *( 7 * - too* - to) 2 (i i-3.il) 


where w, = 1/of (verify that Eq. [11.3.11] andEq. [11.3.7] are identical). 

Thus, in GLS we minimize a weighted sum of residual squares with w, = 1 /erf acting 
as the weights, but in OLS we minimize an unweighted or (what amounts to the same thing) 
equally weighted residual sum of squares (RSS). As Eq. (11.3.7) shows, in GLS the weight 
assigned to each observation is inversely proportional to its er ; , that is, observations 
coming from a population with larger er, will get relatively smaller weight and those from 
a population with smaller cr, will get proportionately larger weight in minimizing the 
RSS (11.3.11). To see the difference between OLS and GLS clearly, consider the hypothet¬ 
ical scattergram given in Ligure 11.7. 

In the (unweighted) OLS, each u] associated with points A, B, and C will receive the 
same weight in minimizing the RSS. Obviously, in this case the u] associated with point C 
will dominate the RSS. But in GLS the extreme observation C will get relatively smaller 
weight than the other two observations. As noted earlier, this is the right strategy, for in 
estimating the population regression function (PRF) more reliably we would like to give 
more weight to observations that are closely clustered around their (population) mean than 
to those that are widely scattered about. 

Since Eq. (11.3.11) minimizes a weighted RSS, it is appropriately known as weighted 
least squares (WLS), and the estimators thus obtained and given in Eqs. (11.3.8) and (11.3.9) 
are known as WLS estimators. But WLS is just a special case of the more general estimating 
technique, GLS. In the context of heteroscedasticity, one can treat the two terms WLS and 
GLS interchangeably. In later chapters we will come across other special cases of GLS. 

In passing, note that if w, = w, a constant for all i, is identical with and var (/f|) 
is identical with the usual (i.e., homoscedastic) var ( fc) given in Eq. (11.2.3), which should 
not be surprising. (Why?) (See Exercise 11.8.) 
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FIGURE 11.7 

Hypothetical 

scattergram. 


Y 



11.4 Consequences of Using OLS in the Presence 
of Heteroscedasticity 

As we have seen, both ft and ft are (linear) unbiased estimators: In repeated sampling, on 
the average, ft and ft will equal the true ft; that is, they are both unbiased estimators. But 
we know that it is ft that is efficient, that is, has the smallest variance. What happens to our 
confidence interval, hypotheses testing, and other procedures if we continue to use the OLS 
estimator ft? We distinguish two cases. 

OLS Estimation Allowing for Heteroscedasticity 

Suppose we use ft and use the variance formula given in Eq. (11.2.2), which takes into 
account heteroscedasticity explicitly. Using this variance, and assuming ft are known, can 
we establish confidence intervals and test hypotheses with the usual t and F tests? The 
answer generally is no because it can be shown that var (ft) < var (ft), 5 which means that 
confidence intervals based on the latter will be unnecessarily larger. As a result, the t and F 
tests are likely to give us inaccurate results in that var (ft) is overly large and what appears 
to be a statistically insignificant coefficient (because the t value is smaller than what is 
appropriate) may in fact be significant if the correct confidence intervals were established on 
the basis of the GLS procedure. 

OLS Estimation Disregarding Heteroscedasticity 

The situation can become serious if we not only use ft but also continue to use the usual 
(homoscedastic) variance formula given in Eq. (11.2.3) even if heteroscedasticity is present 
or suspected: Note that this is the more likely case of the two we discuss here, because 
running a standard OLS regression package and ignoring (or being ignorant of) 
heteroscedasticity will yield variance of ft as given in Eq. (11.2.3). First of all, var(ft) 
given in Eq. (11.2.3) is a biased estimator of var (ft) given in Eq. (11.2.2), that is, on the 


5 A formal proof can be found in Phoebus J. Dhrymes, Introductory Econometrics, Springer-Verlag, 
New York, 1978, pp. 110-111. In passing, note that the loss of efficiency of ft (i.e., by how much 
var [ft] exceeds var [ft]) depends on the sample values of the X variables and the value of ft. 
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average it overestimates or underestimates the latter, and in general we cannot tell whether 
the bias is positive (overestimation) or negative (underestimation) because it depends on 
the nature of the relationship between of and the values taken by the explanatory variable 
X, as can be seen clearly from Eq. (11.2.2) (see Exercise 11.9). The bias arises from the fact 
that a 2 , the conventional estimator of a 2 , namely, Y u]K n ~ 2) is no longer an unbiased 
estimator of the latter when heteroscedasticity is present (see Appendix 11 A.3). As a result, 
we can no longer rely on the conventionally computed confidence intervals and the 
conventionally employed t and F tests. 6 In short, if we persist in using the usual testing 
procedures despite heteroscedasticity, whatever conclusions we draw or inferences we 
make may be very misleading. 

To throw more light on this topic, we refer to a Monte Carlo study conducted by Davidson 
and MacKinnon. 7 They consider the following simple model, which in our notation is 

Y i =p l +p 1 X i +u i (11.4.1) 

They assume that fi\ = 1, /3 2 = 1, and m, ~ N(0, Xf). As the last expression shows, they 
assume that the error variance is heteroscedastic and is related to the value of the regressor X 
with power a . If, for example, a = 1, the error variance is proportional to the value of X; if 
a = 2, the error variance is proportional to the square of the value of X, and so on. In Sec¬ 
tion 11.6 we will consider the logic behind such a procedure. Based on 20,000 replications 
and allowing for various values for a, they obtain the standard errors of the two regression 
coefficients using OLS (see Eq. [11.2.3]), OLS allowing for heteroscedasticity (see 
Eq. [11.2.2]), and GLS (see Eq. [11.3.9]). We quote their results for selected values of a : 



Standard error of /3i 

Standard error of /8 2 

Value of a 

OLS 

OLShet 

GLS 

OLS 

OLShet 

GLS 

0.5 

0.164 

0.134 

0.110 

0.285 

0.277 

0.243 

1.0 

0.142 

0.101 

0.048 

0.246 

0.247 

0.173 

2.0 

0.116 

0.074 

0.0073 

0.200 

0.220 

0.109 

3.0 

0.100 

0.064 

0.001 3 

0.173 

0.206 

0.056 

4.0 

0.089 

0.059 

0.0003 

0.154 

0.195 

0.017 

Note: OLShet mea 

ns OLS allowing fi 

or heteroscedasticit] 






The most striking feature of these results is that OLS, with or without correction for het¬ 
eroscedasticity, consistently overestimates the true standard error obtained by the (correct) 
GLS procedure, especially for large values of a, thus establishing the superiority of GLS. 
These results also show that if we do not use GLS and rely on OLS—allowing for or not 
allowing for heteroscedasticity—the picture is mixed. The usual OLS standard errors are 
either too large (for the intercept) or generally too small (for the slope coefficient) in relation 
to those obtained by OLS allowing for heteroscedasticity. The message is clear: In the pres¬ 
ence of heteroscedasticity, use GLS. However, for reasons explained later in the chapter, in 
practice it is not always easy to apply GLS. Also, as we discuss later, unless heteroscedastic¬ 
ity is very severe, one may not abandon OLS in favor of GLS or WLS. 

Lrom the preceding discussion it is clear that heteroscedasticity is potentially a serious 
problem and the researcher needs to know whether it is present in a given situation. If its 

6 From Eq. (5.3.6) we know that the 100(1 - a)% confidence interval for is [(>2 ± L/2 se Ofe)]- But 
if se(/§2) cannot be estimated unbiasedly, what trust can we put in the conventionally computed 
confidence interval? 

7 Russell Davidson and lames C. MacKinnon, Estimation and Inference in Econometrics, Oxford 
University Press, New York, 1993, pp. 549-550. 
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presence is detected, then one can take corrective action, such as using the weighted least- 
squares regression or some other technique. Before we turn to examining the various cor¬ 
rective procedures, however, we must first find out whether heteroscedasticity is present or 
likely to be present in a given case. This topic is discussed in the following section. 

A Technical Note 

Although we have stated that, in cases of heteroscedasticity, it is the GLS, not the OLS, that 
is BLUE, there are examples where OLS can be BLUE, despite heteroscedasticity. 8 But 
such examples are infrequent in practice. 

11.5 Detection of Heteroscedasticity 

As with multicollinearity, the important practical question is: How does one know that 
heteroscedasticity is present in a specific situation? Again, as in the case of multicollinearity, 
there are no hard-and-fast rules for detecting heteroscedasticity, only a few rules of thumb. But 
this situation is inevitable because erf can be known only if we have the entire Y population 
corresponding to the chosen X’s, such as the population shown in Table 2.1 or Table ILL 
But such data are an exception rather than the rule in most economic investigations. In this 
respect the econometrician differs from scientists in fields such as agriculture and biology, 
where researchers have a good deal of control over their subjects. More often than not, in 
economic studies there is only one sample Y value corresponding to a particular value of X. 
And there is no way one can know of from just one Y observation. Therefore, in most cases 
involving econometric investigations, heteroscedasticity may be a matter of intuition, edu¬ 
cated guesswork, prior empirical experience, or sheer speculation. 

With the preceding caveat in mind, let us examine some of the informal and formal 
methods of detecting heteroscedasticity. As the following discussion will reveal, most of 
these methods are based on the examination of the OLS residuals u, since they are the ones 
we observe, and not the disturbances u,. One hopes that they are good estimates of u,, a 
hope that may be fulfilled if the sample size is fairly large. 

Informal Methods 

Nature of the Problem 

Very often the nature of the problem under consideration suggests whether heteroscedas¬ 
ticity is likely to be encountered. For example, following the pioneering work of Prais and 
Houthakker on family budget studies, where they found that the residual variance around 
the regression of consumption on income increased with income, one now generally as¬ 
sumes that in similar surveys one can expect unequal variances among the disturbances. 9 
As a matter of fact, in cross-sectional data involving heterogeneous units, heteroscedastic¬ 
ity may be the rule rather than the exception. Thus, in a cross-sectional analysis involving 
the investment expenditure in relation to sales, rate of interest, etc., heteroscedasticity is 
generally expected if small-, medium-, and large-size firms are sampled together. 

8 The reason for this is that the Gauss-Markov theorem provides the sufficient (but not necessary) 
condition for OLS to be efficient. The necessary and sufficient condition for OLS to be BLUE is given by 
Kruskal's theorem. But this topic is beyond the scope of this book. I am indebted to Michael McAleer 
for bringing this to my attention. For further details, see Denzil G. Fiebig, Michael McAleer, and Robert 
Bartels, "Properties of Ordinary Least Squares Estimators in Regression Models with Nonspherical 
Disturbances," journal of Econometrics, vol. 54, No. 1 -3, Oct.-Dec., 1992, pp. 321 -334. For the 
mathematically inclined student, I discuss this topic further in Appendix C, using matrix algebra. 

9 S. J. Prais and H. S. Houthakker, The Analysis of Family Budgets, Cambridge University Press, New 
York, 1955. 
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As a matter of fact, we have already come across examples of this. In Chapter 2 we dis¬ 
cussed the relationship between mean, or average, hourly wages in relation to years of 
schooling in the United States. In that chapter we also discussed the relationship between 
expenditure on food and total expenditure for 55 families in India (see Exercise 11.16). 

Graphical Method 

If there is no a priori or empirical information about the nature of heteroscedasticity, in 
practice one can do the regression analysis on the assumption that there is no heteroscedas¬ 
ticity and then do a postmortem examination of the residual squared u] to see if they exhibit 
any systematic pattern. Although u] are not the same thing as u], they can be used as prox¬ 
ies especially if the sample size is sufficiently large. 10 An examination of the uj may reveal 
patterns such as those shown in Figure 11.8. 

In Figure 11.8, it] are plotted against %, the estimated Y, from the regression line, the idea 
being to find out whether the estimated mean value of Y is systematically related to the 
squared residual. In Figure 11,8a we see that there is no systematic pattern between the two 
variables, suggesting that perhaps no heteroscedasticity is present in the data. Figures 11.86 
to e, however, exhibit definite patterns. For instance, Figure 11.8c suggests a linear relation¬ 
ship, whereas Figures 11.8 d and e indicate a quadratic relationship between u ? and Y t . Using 
such knowledge, albeit informal, one may transform the data in such a manner that the trans¬ 
formed data do not exhibit heteroscedasticity. In Section 11.6 we shall examine several such 
transformations. 

Instead of plotting u] against Y t , one may plot them against one of the explanatory 
variables, especially if plotting uj against Y t results in the pattern shown in Figure 11.8a. 
Such a plot, which is shown in Figure 11.9, may reveal patterns similar to those given in 
Figure 11.8. (In the case of the two-variable model, plotting w? against Y t is equivalent to 


FIGURE 11.8 

Hypothetical patterns 
of estimated squared 
residuals. 




0 



(e) 


10 For the relationship between u, and u,-, see E. Malinvaud, Statistical Methods of Econometrics, North 
Holland Publishing Company, Amsterdam, 1970, pp. 88-89. 
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FIGURE 11.9 

Scattergram of 
estimated squared 
residuals against X. 



plotting it against X ,, and therefore Figure 11.9 is similar to Figure 11.8. But this is not the 
situation when we consider a model involving two or more X variables; in this instance, uf 
may be plotted against any X variable included in the model.) 

A pattern such as that shown in Figure 11.9c, for instance, suggests that the variance of 
the disturbance term is linearly related to the X variable. Thus, if in the regression of sav¬ 
ings on income one finds a pattern such as that shown in Figure 11.9c, it suggests that the 
heteroscedastic variance may be proportional to the value of the income variable. This 
knowledge may help us in transforming our data in such a manner that in the regression on 
the transformed data the variance of the disturbance is homoscedastic. We shall return to 
this topic in the next section. 

Formal Methods 

Park Test 11 

Park formalizes the graphical method by suggesting that af is some function of the 
explanatory variable X ,. The functional form he suggests is 

erf = er 2 X^e VI 


or 


lner? = In cr 2 + ^ lnW, + v, (11.5.1) 


where v, is the stochastic disturbance term. 


"R. E. Park, "Estimation with Heteroscedastic Error Terms," Econometrica, vol. 34, no. 4, October 
1966, p. 888. The Park test is a special case of the general test proposed by A. C. Harvey in 
"Estimating Regression Models with Multiplicative Heteroscedasticity," Econometrica, vol. 44, no. 3, 
1976, pp. 461-465. 
















Chapter 11 Heteroscedasticity: What Happens If the Error Variance Is Nonconstant? 379 



Since of is generally not known, Park suggests using u 2 as a proxy and running the 
following regression: 

In uf = lncr 2 + / 3 In X, + v* (115 2) 

= a + ft In X, + v, 

If P turns out to be statistically significant, it would suggest that heteroscedasticity is 
present in the data. If it turns out to be insignificant, we may accept the assumption of 
homoscedasticity. The Park test is thus a two-stage procedure. In the first stage we run the 
OLS regression disregarding the heteroscedasticity question. We obtain ii, from this 
regression, and then in the second stage we run the regression (11.5.2). 

Although empirically appealing, the Park test has some problems. Goldfeld and Quandt 
have argued that the error term v, entering into Eq. (11.5.2) may not satisfy the OLS assump¬ 
tions and may itself be heteroscedastic. 12 Nonetheless, as a strictly exploratory method, one 
may use the Park test. 

EXAMPLE 11.1 

Relationship 
between 
Compensation 
and Productivity 

To illustrate the Park approach, we use the data given in Table 11.1 to run the following 
regression: 

Yi = P\ +p 2 Xi+u, 

where Y = average compensation in thousands of dollars, X = average productivity in 
thousands of dollars, and 1 = /th employment size of the establishment. The results of the 
regression are as follows: 

?/= 1992.3452 + 0.2329X, 

se= (936.4791) (0.0998) (11.5.3) 

t= (2.1275) (2.333) R 2 = 0.4375 

The results reveal that the estimated slope coefficient is significant at the 5 percent level 
on the basis of a one-tail ttest. The equation shows that as labor productivity increases by, 
say, a dollar, labor compensation on the average increases by about 23 cents. 

The residuals obtained from regression (11.5.3) are then regressed on X, as suggested 
in Eq. (11.5.2), giving the following results: 

IrTo? = 35.81 7 - 2.8099 In X, 

se = (38.319) (4.216) (11.5.4) 

t= (0.934) (-0.667) R 2 = 0.0595 

Obviously, there is no statistically significant relationship between the two variables. 
Following the Park test, one may conclude that there is no heteroscedasticity in the error 
variance. 13 


Glejser Test 14 

The Glejser test is similar in spirit to the Park test. After obtaining the residuals ii l from the 
OLS regression, Glejser suggests regressing the absolute values of ii l on the A variable that 

12 Stephen M. Goldfeld and Richard E. Quandt, Nonlinear Methods in Econometrics, North Holland 
Publishing Company, Amsterdam, 1972, pp. 93-94. 

13 The particular functional form chosen by Park is only suggestive. A different functional form may reveal 
significant relationships. For example, one may use uf instead of In uf as the dependent variable. 

14 H. Glejser, "A New Test for Heteroscedasticity," journal of the American Statistical Association, vol. 64, 
1969, pp. 316-323. 
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is thought to be closely associated with of. In his experiments, Glejser uses the following 
functional forms: 

I ^ I = Pi +thXi+ Vi 

\uj\ = f} l +p 2 Vx i +v i 

I Mi | =Pl+P2 

N.ft + ft-^ + v , 

| W; I = y/+ PlXi + Vi 

I Mi I =Jfh + PlXl + Vi 

where v, is the error term. 

Again as an empirical or practical matter, one may use the Glejser approach. But Gold- 
feld and Quandt point out that the error term v, has some problems in that its expected 
value is nonzero, it is serially correlated (see Chapter 12), and, ironically, it is het- 
eroscedastic. 15 An additional difficulty with the Glejser method is that models such as 

\Zi\ = JPl+P2X i + V i 

and 

I Zi\=y/Pl+lhXf+V, 

are nonlinear in the parameters and therefore cannot be estimated with the usual OLS 
procedure. 

Glejser has found that for large samples the first four of the preceding models give 
generally satisfactory results in detecting heteroscedasticity. As a practical matter, there¬ 
fore, the Glejser technique may be used for large samples and may be used in the small 
samples strictly as a qualitative device to learn something about heteroscedasticity. 


EXAMPLE 11.2 

Relationship 
between 
Compensation 
and Productivity: 
The Glejser Test 


Continuing with Example 11.1, the absolute value of the residuals obtained from regres¬ 
sion (11.5.3) were regressed on average productivity (X), giving the following results: 

[0d= 407.2783 - 0.0203X, 

se = (633.1621) (0.0675) r 2 = 0.0127 (11.5.5) 

t= (0.6432) (-0.3012) 

As you can see from this regression, there is no relationship between the absolute value of 
the residuals and the regressor, average productivity. This reinforces the conclusion based 
on the Park test. 


Spearman s Rank Correlation Test 

In Exercise 3.8 we defined the Spearman’s rank correlation coefficient as 


r s = 1 — 6 


Edf 


n(n 2 — 1) 


(11.5.6) 


15 For details, see Goldfeld and Quandt, op. cit., Chapter 3. 
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where d, = difference in the ranks assigned to two different characteristics of the ith indi¬ 
vidual or phenomenon and n = number of individuals or phenomena ranked. The preced¬ 
ing rank correlation coefficient can be used to detect heteroscedasticity as follows: Assume 
Yi = p 0 + PiXi+m. 

Step 1. Fit the regression to the data on Y and X and obtain the residuals u,. 

Step 2. Ignoring the sign of u,, that is, taking their absolute value \u l \, rank both w, 
and A, (or lj) according to an ascending or descending order and compute the Spear¬ 
man’s rank correlation coefficient given previously. 

Step 3. Assuming that the population rank correlation coefficient p s is zero and 
n > 8, the significance of the sample r s can be tested by the t test as follows: 16 



(11.5.7) 


with df = n — 2. 

If the computed t value exceeds the critical t value, we may accept the hypothesis of 
heteroscedasticity; otherwise we may reject it. If the regression model involves more than 
one X variable, r s can be computed between \u, | and each of the X variables separately and 
can be tested for statistical significance by the t test given in Eq. (11.5.7). 


EXAMPLE 11.3 To illustrate the rank correlation test, consider the data given in Table 11.2. The data 
Illustration of the pertain to the average annual return (£, %) and the standard deviation of annual return 
Rank Correlation (ff " %) of 10 mutual funds ’ 

Test 


TABLE 11.2 Rank Correlation Test of Heteroscedasticity 


Name of 

Ei, 

Average 

Annual 

Return, 

Standard 
Deviation 
of Annual 


\uh* 

Residuals, 

Rank 

Rank 

d, 

Difference 

between 

Two 


Mutual Fund 

% 

Return, % 

E,t 

l(£/ - f/)l 

Of |U;| 

of <T; 

Rankings 

d 2 

Boston Fund 

12.4 

12.1 

11.37 

1.03 

9 

4 

5 

25 

Delaware Fund 

14.4 

21.4 

15.64 

1.24 

10 

9 

1 

1 

Equity Fund 

14.6 

18.7 

14.40 

0.20 

4 

7 

-3 

9 

Fundamental Investors 

16.0 

21.7 

15.78 

0.22 

5 

10 

-5 

25 

Investors Mutual 

11.3 

12.5 

11.56 

0.26 

6 

5 

1 

1 

Loomis-Sales Mutual Fund 

10.0 

10.4 

10.59 

0.59 

7 

2 

5 

25 

Massachusetts Investors Trust 

16.2 

20.8 

15.37 

0.83 

8 

8 

0 

0 

New England Fund 

10.4 

10.2 

10.50 

0.10 

3 

1 

2 

4 

Putnam Fund of Boston 

13.1 

16.0 

13.16 

0.06 

2 

6 

-4 

16 

Wellington Fund 

Total 

11.3 

12.0 

11.33 

0.03 

1 

3 

-2 

0 

4 

110 


Obtained from the regression: £,= 5.8194 + 0.4590 a,-. 

* Absolute value of the residuals. 

Note: The ranking is in ascending order of values. ( Continued ) 


16 See C. Udny Yule and M. C. Kendall, An Introduction to the Theory of Statistics, Charles Griffin & 
Company, London, 1953, p. 455. 







382 Part Two Relaxing the Assumptions of the Classical Model 


EXAMPLE 11.3 

( Continued) 


The capital market line (CML) of portfolio theory postulates a linear relationship 
between expected return (£,-) and risk (as measured by the standard deviation, a) of a 
portfolio as follows: 

Ei=pi + p 2 cn 


Using the data in Table 11.2, the preceding model was estimated and the residuals from 
this model were computed. Since the data relate to 10 mutual funds of differing sizes and 
investment goals, a priori one might expect heteroscedasticity. To test this hypothesis, we 
apply the rank correlation test. The necessary calculations are given in Table 11.2. 

Applying formula (11.5.6), we obtain 


fs 1 6 10(100 - 1) (11.5.8) 

= 0.3333 

Applying the t test given in Eq. (11.5.7), we obtain 

(0.3333)(V8) 

Vi -0.1110 (11.5.9) 

= 0.9998 

For 8 df this t value is not significant even at the 10 percent level of significance; the p 
value is 0.17. Thus, there is no evidence of a systematic relationship between the ex¬ 
planatory variable and the absolute values of the residuals, which might suggest that there 
is no heteroscedasticity. 


Goldfeld-Quandt Test 17 

This popular method is applicable if one assumes that the heteroscedastic variance, of, is 
positively related to one of the explanatory variables in the regression model. For simplic¬ 
ity, consider the usual two-variable model: 

Yt=P i + foXi + Ui 

Suppose of is positively related to X t as 

of = o 2 X 2 (11.5.10) 

where a 2 is a constant. 18 

Assumption (11.5.10) postulates that of is proportional to the square of the A variable. 
Such an assumption has been found quite useful by Prais and Houthakker in their study of 
family budgets. (See Section 11.5, informal methods.) 

If Eq. (11.5.10) is appropriate, it would mean of would be larger, the larger the values 
of Xj. If that turns out to be the case, heteroscedasticity is most likely to be present in the 
model. To test this explicitly, Goldfeld and Quandt suggest the following steps: 

Step 1. Order or rank the observations according to the values of X t , beginning with 
the lowest X value. 

Step 2. Omit c central observations, where c is specified a priori, and divide the 
remaining (n — c ) observations into two groups each of (n - c)/2 observations. 

Step 3. Fit separate OLS regressions to the first (n - c)/2 observations and the last 
(« — c)/2 observations, and obtain the respective residual sums of squares RSSi and 

17 Goldfeld and Quandt, op. cit., Chapter 3. 

18 This is only one plausible assumption. Actually, what is required is that erf be monotonically 
related to X,-. 
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RSS2, RSSi representing the RSS from the regression corresponding to the smaller^ 
values (the small variance group) and RSS 2 that from the larger X t values (the large 
variance group). These RSS each have 



where k is the number of parameters to be estimated, including the intercept. (Why?) 
For the two-variable case k is of course 2. 

Step 4. Compute the ratio 


RSS 2 /df 
“ RSSi/df 


(11.5.11) 


If we assume Ui are normally distributed (which we usually do), and if the assumption 
ofhomoscedasticity is valid, then it can be shown that X of Eq. (11.5.10) follows the F 
distribution with numerator and denominator df each of (n — c — 2k)/2. 

If in an application the computed X ( = F) is greater than the critical F at the chosen 
level of significance, we can reject the hypothesis ofhomoscedasticity, that is, we can say 
that heteroscedasticity is very likely. 

Before illustrating the test, a word about omitting the c central observations is in order. 
These observations are omitted to sharpen or accentuate the difference between the small 
variance group (i.e., RSSi) and the large variance group (i.e., RSS 2 ). But the ability of the 
Goldfeld-Quandt test to do this successfully depends on how c is chosen. 19 For the two- 
variable model the Monte Carlo experiments done by Goldfeld and Quandt suggest that c 
is about 8 if the sample size is about 30, and it is about 16 if the sample size is about 60. 
But Judge et al. note that c = 4 if n — 30 and c — 10 if n is about 60 have been found sat¬ 
isfactory in practice. 20 

Before moving on, it may be noted that in case there is more than oneXvariable in the model, 
the ranking of observations, the first step in the test, can be done according to any one of them. 
Thus in the model: Y t = f}\ + fhXii + P3X3 ; + @4X41 + Ut, we can rank-order the data accord¬ 
ing to any one of these Ws. If a priori we are not sure which X variable is appropriate, we can 
conduct the test on each of the X variables, or via a Park test, in turn, on each X. 


EXAMPLE 11.4 

The 

Goldfeld-Quandt 

Test 


To illustrate the Goldfeld-Quandt test, we present in Table 11.3 data on consumption 
expenditure in relation to income for a cross section of 30 families. Suppose we postulate 
that consumption expenditure is linearly related to income but that heteroscedasticity is 
present in the data. We further postulate that the nature of heteroscedasticity is as given in 
Eq. (11.5.10). The necessary reordering of the data for the application of the test is also 
presented in Table 11.3. 

Dropping the middle 4 observations, the OLS regressions based on the first 13 and the 
last 13 observations and their associated residual sums of squares are as shown next (stan¬ 
dard errors in the parentheses). 

( Continued ) 


^Technically, the power of the test depends on how c is chosen. In statistics, the power of a test is mea¬ 
sured by the probability of rejecting the null hypothesis when it is false (i.e., by 1 - Prob [type II error]). 
Here the null hypothesis is that the variances of the two groups are the same, i.e., homoscedasticity. For 
further discussion, see M. M. Ali and C. Giaccotto, "A Study of Several New and Existing Tests for 
Heteroscedasticity in the General Linear Model," Journal of Econometrics, vol. 26,1984, pp. 355-573. 
20 George G. Judge, R. Carter Hill, William E. Griffiths, Helmut Lutkepohl, and Tsoung-Chao Lee, 
Introduction to the Theory and Practice of Econometrics, John Wiley & Sons, New York, 1982, p. 422. 
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EXAMPLE 11.4 

0 Continued) 


TABLE 11.3 Hypothetical Data on Consumption Expenditure T($) and Income A"($) to 
Illustrate the Goldfeld-Quandt Test 


Y 


X 


55 

65 

70 

80 

79 

84 

98 

95 

90 

75 

74 

110 

113 

125 

108 

115 

140 

120 

145 

130 

152 

144 

175 

180 

135 

140 

178 

191 

137 

189 


80 

100 

85 

110 

120 

115 

130 

140 

125 

90 

105 

160 

150 

165 

145 

180 

225 

200 

240 

185 

220 

210 

245 

260 

190 

205 

265 

270 

230 

250 


Y 

55 

70 

75 

65 

74 

80 

84 

79 

90 

98 

95 

108 

113 

110 

125 

115 

130 

135 

120 

140 

144 
152 
140 
137 

145 
175 
189 
180 
178 
191 


Data Ranked by 
X Values 
X 
80 
85 
90 
100 
105 
110 
115 
120 
125 
130 
140 
145 
150 
160 ' 

165 Middle 4 
180 observations 
185 
190 
200 
205 
210 
220 
225 
230 
240 
245 
250 
260 
265 
270 


Regression based on the first 13 observations: 

Yi = 3.4094 + 0.6968X, 

(8.7049) (0.0744) r 2 = 0.8887 RSS, = 377.17 df = 11 

Regression based on the last 13 observations: 

/, = - 28.0272 + 0.7941 X, 

(30.6421) (0.1319) r 2 = 0.7681 RSS 2 = 1536.8 df = 11 

From these results we obtain 
RSS 2 /df 1536.8/11 
RSSi/df ~ 377.17/11 
X = 4.07 

The critical F value for 11 numerator and 11 denominator df at the 5 percent level is 2.82. 
Since the estimated F(=X) value exceeds the critical value, we may conclude that there 
is heteroscedasticity in the error variance. However, if the level of significance is fixed at 
1 percent, we may not reject the assumption of homoscedasticity. (Why?) Note that the p 
value of the observed X is 0.014. 
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Breusch-Pagan-Godfrey Test 21 

The success of the Goldfeld-Quandt test depends not only on the value of c (the number of 
central observations to be omitted) but also on identifying the correct X variable with which 
to order the observations. This limitation of this test can be avoided if we consider the 
Breusch-Pagan-Godfrey (BPG) test. 

To illustrate this test, consider the ^-variable linear regression model 

Yt=P i + (hX 2i + • • • + fi k X kl + Ui (11.5.12) 

Assume that the error variance <r, 2 is described as 

af = /(«i + a 2 Z 2i + ■ ■ ■+a m Z mi ) (11.5.13) 

that is, of is some function of the nonstochastic Z variables; some or all of then’s can serve 
as Z’s. Specifically, assume that 

of = ai + a 2 Z 2i +---+a m Z mi (11.5.14) 

that is, of is a linear function of the Z’s. If a 2 = a 2 = ■ ■ ■ = a m = 0, of = u \, which is a 
constant. Therefore, to test whether of is homoscedastic, one can test the hypothesis that 
«2 = «3 = ■ ■ ■ = = 0. This is the basic idea behind the Breusch-Pagan-Godfrey test. 

The actual test procedure is as follows. 

Step 1. Estimate Eq. (11.5.12) by OLS and obtain the residuals u\,u 2 ,..., u„. 

Step 2. Obtain rf 2 = ^fuj/n. Recall from Chapter 4 that this is the maximum 
likelihood (ML) estimator of o 2 . (Note: The OLS estimator is u 2 J[n — &].) 

Step 3. Construct variables p t defined as 

Pi =u 2 i/o 2 

which is simply each residual squared divided by er 2 . 

Step 4. Regress p, thus constructed on the Z’s as 

Pi = u\ + a 2 Z 2i H-b a m Z mi +v ; (11.5.15) 

where v, is the residual term of this regression. 

Step 5. Obtain the ESS (explained sum of squares) from Eq. (11 .5. 1 5) and define 

© = ^(ESS) (11.5.16) 

Assuming u, are normally distributed, one can show that if there is homoscedasticity 
and if the sample size n increases indefinitely, then 

®~x£-i (11.5.17) 

that is, © follows the chi-square distribution with (m — 1) degrees of freedom. 

(Note: asy means asymptotically.) 


21 T. Breusch and A. Pagan, "A Simple Test for Heteroscedasticity and Random Coefficient Variation," 
Econometrica, vol. 47, 1979, pp. 1287-1294. See also L. Godfrey, "Testing for Multiplicative 
Heteroscedasticity," journal of Econometrics, vol. 8, 1978, pp. 227-236. Because of similarity, these 
tests are known as Breusch-Pagan-Godfrey tests of heteroscedasticity. 
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EXAMPLE 11.5 

The Breusch— 
Pagan-Godfrey 
(BPG) Test 


Therefore, if in an application the computed 0 (= x 2 ) exceeds the critical x 2 value at 
the chosen level of significance, one can reject the hypothesis of homoscedasticity; 
otherwise one does not reject it. 

The reader may wonder why BPG chose jESS as the test statistic. The reasoning is slightly 
involved and is left for the references. 22 


As an example, let us revisit the data (Table 11.3) that were used to illustrate the Goldfeld- 
Quandt heteroscedasticity test. Regressing Y on X, we obtain the following: 

Step 1. 

9, = 9.2903 + 0.6378X, 

se = (5.2314) (0.0286) RSS = 2361.153 R 2 = 0.9466 (11.5.18) 

Step 2. 

a 2 = df/30 = 2361.153/30 = 78.7051 

Step 3. Divide the squared residuals u, obtained from regression (11.5.18) by 78.7051 
to construct the variable p,. 

Step 4. Assuming that p, are linearly related to X,(= Z,) as per Eq. (11.5.14), we 
obtain the regression 

Pi = -0.7426 + 0.0101X, 

se = (0.7529) (0.0041) ESS = 10.4280 R 2 = 0.18 (11.5.19) 

Step 5. 

0= 1(ESS) = 5.2140 (11.5.20) 

Under the assumptions of the BPG test © in Eq. (11.5.20) asymptotically follows the 
chi-square distribution with 1 df. (Note: There is only one regressor in Eq. [11.5.19].) Now 
from the chi-square table we find that for 1 df the 5 percent critical chi-square value is 
3.8414 and the 1 percent critical x 2 value is 6.6349. Thus, the observed chi-square value 
of 5.2140 is significant at the 5 percent but not the 1 percent level of significance. There¬ 
fore, we reach the same conclusion as the Goldfeld-Quandt test. But keep in mind that, 
strictly speaking, the BPG test is an asymptotic, or large-sample, test and in the present 
example 30 observations may not constitute a large sample. It should also be pointed out 
that in small samples the test is sensitive to the assumption that the disturbances u; are 
normally distributed. Of course, we can test the normality assumption by the tests 
discussed in Chapter 5. 23 


White s General Heteroscedasticity Test 

Unlike the Goldfeld-Quandt test, which requires reordering the observations with respect 
to the X variable that supposedly caused heteroscedasticity, or the BPG test, which is sen¬ 
sitive to the normality assumption, the general test of heteroscedasticity proposed by White 


22 See Adrian C. Darnell, A Dictionary of Econometrics, Edward Elgar, Cheltenham, U.K., 1994, 
pp. 1 78-1 79. 

23 On this, see R. Koenker, "A Note on Studentizing a Test for Heteroscedasticity," lournal of 
Econometrics, vol. 17, 1981, pp. 1180-1200. 
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does not rely on the normality assumption and is easy to implement. 24 As an illustration of 
the basic idea, consider the following three-variable regression model (the generalization to 
the &-variable model is straightforward): 

Yi = fr+ frX2i + &X 3 ; +Ui (11.5.21) 

The White test proceeds as follows: 

Step 1. Given the data, we estimate Eq. (11.5.21) and obtain the residuals, it,. 

Step 2. We then run the following ( auxiliary ) regression: 

tif = Oil + 0.2X21 + 0 : 3 X 3 ; + O 4 X 2 ; + 05 X 3 ,- + 0 ^X 2 ; X 3 ; + V; 

(11.5.22 ) 25 

That is, the squared residuals from the original regression are regressed on the 
original X variables or regressors, their squared values, and the cross product(s) of the 
regressors. Higher powers of regressors can also be introduced. Note that there is a 
constant term in this equation even though the original regression may or may not con¬ 
tain it. Obtain the R 2 from this (auxiliary) regression. 

Step 3. Under the null hypothesis that there is no heteroscedasticity, it can he shown 
that sample size (n) times the R 2 obtained from the auxiliary regression asymptotically 
follows the chi-square distribution with df equal to the number of regressors (exclud¬ 
ing the constant term) in the auxiliary regression. That is, 

«-« 2 ~Xd 2 f (11.5.23) 

where df is as defined previously. In our example, there are 5 df since there are 
5 regressors in the auxiliary regression. 

Step 4. If the chi-square value obtained in Eq. (11.5.23) exceeds the critical 
chi-square value at the chosen level of significance, the conclusion is that there is 
heteroscedasticity. If it does not exceed the critical chi-square value, there is no 
heteroscedasticity, which is to say that in the auxiliary regression (11.5.22), 
a 2 = 0:3 = 0:4 = 0:5 = £*6 = 0 (see footnote 25). 


EXAMPLE 11.6 

White’s 

Heteroscedasticity 

Test 


( Continued ) 

24 H. White, "A Heteroscedasticity Consistent Covariance Matrix Estimator and a Direct Test of 
Heteroscedasticity," Econometrica, vol. 48, 1980, pp. 817-818. 

25 lmplied in this procedure is the assumption that the error variance of u;,ct 2 , is functionally related 
to the regressors, their squares, and their cross products. If all the partial slope coefficients in this 
regression are simultaneously equal to zero, then the error variance is the homoscedastic constant 
equal to <*1. 

“Stephen R. Lewis, "Government Revenue from Foreign Trade," Manchester School of Economics and 
Social Studies, vol. 31, 1963, pp. 39-47. 


From cross-sectional data on 41 countries, Stephen Lewis estimated the following regres¬ 
sion model: 26 

In Y, = Pi+P 2 In X 2/ + Pi In X 3/ + tv; ( 11 . 5 . 24 ) 

where Y = ratio of trade taxes (import and export taxes) to total government revenue, 
X 2 = ratio of the sum of exports plus imports to GNP, and X3 = GNP per capita; and In 
stands for natural log. His hypotheses were that Y and X 2 would be positively related (the 
higher the trade volume, the higher the trade tax revenue) and that Y and X3 would be 
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EXAMPLE 11.6 

( Continued) 


negatively related (as income increases, government finds it is easier to collect direct 
taxes—e.g., income tax—than it is to rely on trade taxes). 

The empirical results supported the hypotheses. For our purpose, the important point 
is whether there is heteroscedasticity in the data. Since the data are cross-sectional involv¬ 
ing a heterogeneity of countries, a priori one would expect heteroscedasticity in the error 
variance. By applying White's heteroscedasticity test to the residuals obtained from re¬ 
gression (11.5.24), the following results were obtained: 27 

Of = -5.8417 + 2.5629 In Trade/ + 0.6918 In GNP, 

-0.4081 (In Trade/) 2 - 0.0491 (In GNP/) 2 (11.5.25) 

+0.0015(ln Trade/)(ln GNP/) R 2 = 0.1148 

Note: The standard errors are not given, as they are not pertinent for our purpose here. 

Now n- R 2 = 41(0.1148) = 4.7068, which has, asymptotically, a chi-square distri¬ 
bution with 5 df (why?). The 5 percent critical chi-square value for 5 df is 11.0705, the 
10 percent critical value is 9.2363, and the 25 percent critical value is 6.62568. For all 
practical purposes, one can conclude, on the basis of the White test, that there is no 
heteroscedasticity. 


A comment is in order regarding the White test. If a model has several regressors, then 
introducing all the regressors, their squared (or higher-powered) terms, and their cross 
products can quickly consume degrees of freedom. Therefore, one must use caution in 
using the test. 28 

In cases where the White test statistic given in Eq. (11.5.25) is statistically significant, 
heteroscedasticity may not necessarily be the cause, but specification errors, about which 
more will be said in Chapter 13 (recall point 5 of Section 11.1). In other words, the White 
test can be a test of (pure) heteroscedasticity or specification error or both. It has been 
argued that if no cross-product terms are present in the White test procedure, then it is a test 
of pure heteroscedasticity. If cross-product terms are present, then it is a test of both het¬ 
eroscedasticity and specification bias. 29 

Other Tests of Heteroscedasticity 

There are several other tests of heteroscedasticity, each based on certain assumptions. The 
interested reader may want to consult the references. 30 We mention but one of these tests 
because of its simplicity. This is the Koenker-Bassett (KB) test. Like the Park, 
Breusch-Pagan-Godfrey, and White’s tests of heteroscedasticity, the KB test is based on 
the squared residuals, u 2 , but instead of being regressed on one or more regressors, the 
squared residuals are regressed on the squared estimated values of the regressand. Specifi¬ 
cally, if the original model is: 

f = j8i + f} 2 X 2 i + p 3 X 3i + ■ ■ • + p k X k i + Ui (11.5.26) 

27 These results, with change in notation, are reproduced from William F. Lott and Subhash C. Ray, 
Applied Econometrics: Problems with Data Sets, Instructor's Manual, Chapter 22, pp. 137-140. 
28 Sometimes the test can be modified to conserve degrees of freedom. See Exercise 11.18. 

29 See Richard Harris, Using Cointegration Analysis in Econometrics Modelling, Prentice Hall St Harvester 
Wheatsheaf, U.K., 1995, p. 68. 

30 See M. J. Harrison and B. P. McCabe, "A Test for Heteroscedasticity Based on Ordinary Least Squares 
Residuals," journal of the American Statistical Association, vol. 74, 1979, pp. 494-499; J. Szroeter, 

"A Class of Parametric Tests for Heteroscedasticity in Linear Econometric Models," Econometrica, 
vol. 46, 1978, pp. 1311-1327; M. A. Evans and M. L. King, "A Further Class of Tests for Heteroscedas¬ 
ticity," journal of Econometrics, vol. 37, 1988, pp. 265-276; and R. Koenker and C. Bassett, "Robust 
Tests for Heteroscedasticity Based on Regression Quantiles," Econometrica, vol. 50, 1982, pp. 43-61. 
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you estimate this model, obtain u, from this model, and then estimate 

vt} = ai+a 2 (ftf + vt (11.5.27) 

where Y, are the estimated values from the model (11.5.26). The null hypothesis is that 
o?2 = 0. If this is not rejected, then one could conclude that there is no heteroscedasticity. The 
null hypothesis can be tested by the usual t test or the F test. (Note that F\j, = tf.) If the 
model (11.5.26) is double log, then the squared residuals are regressed on (log Y,) 2 . One other 
advantage of the KB test is that it is applicable even if the error term in the original model 
(11.5.26) is not normally distributed. If you apply the KB test to Example 11.1, you will find 
that the slope coefficient in the regression of the squared residuals obtained from Eq. (11.5.3) 
on the estimated Yf from Eq. (11.5.3) is statistically not different from zero, thus reinforcing 
the Park test. This result should not be surprising since in the present instance we only have a 
single regressor. But the KB test is applicable if there is one regressor or many. 

A Note Regarding the Tests of Heteroscedasticity 

We have discussed several tests of heteroscedasticity in this section. So how do we decide 
which is the best test? This is not an easy question to answer, for these tests are based on var¬ 
ious assumptions. In comparing the tests, we need to pay attention to their size (or level of sig¬ 
nificance), power (the probability of rejecting a false hypothesis), and sensitivity to outliers. 

We have already pointed out some of the limitations of the popular and easy-to-apply 
White’s test of heteroscedasticity. As a result of these limitations, it may have low power 
against the alternatives. Besides, the test is of little help in identifying the factors or vari¬ 
ables that cause heteroscedasticity. 

Similarly, the Breusch-Pagan-Godfrey test is sensitive to the assumption of normality. 
In contrast, the test of Koenker-Bassett does not rely on the normality assumption and 
may therefore be more powerful. 31 In the Goldfeld-Quandt test if we omit too many 
observations, we may diminish the power of the test. 

It is beyond the scope of this text to provide a comparative analysis of the various 
heteroscedasticity tests. But the interested reader may refer to the article by John Lyon and 
Chin-Ling Tsai to get some idea about the strengths and weaknesses of the various tests of 
heteroscedasticity. 32 

11.6 Remedial Measures 


As we have seen, heteroscedasticity does not destroy the unbiasedness and consistency 
properties of the OLS estimators, but they are no longer efficient, not even asymptotically 
(i.e., large sample size). This lack of efficiency makes the usual hypothesis-testing proce¬ 
dure of dubious value. Therefore, remedial measures may be called for. There are two 
approaches to remediation: when of is known and when of is not known. 

When trf Is Known: The Method of Weighted Least Squares 

As we have seen in Section 11.3, if of is known, the most straightforward method of 
correcting heteroscedasticity is by means of weighted least squares, for the estimators thus 
obtained are BLUE. 

31 For details, see William H. Green, Econometric Analysis, 6th ed., Pearson/Prentice-Hall, New Jersey, 
2008, pp. 165-167. 

32 See their article, "A Comparison of Tests of Heteroscedasticity," The Statistician, vol. 45, no. 3, 

1996, pp. 337-349. 
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EXAMPLE 11.7 

Illustration of the 
Method of 
Weighted Least 
Squares 


TABLE 11.4 
Illustration 
of Weighted Least- 
Squares Regression 

Source: Data on Y and Of 
(standard deviation of 
compensation) are from 
Table 11.1. Employment size: 
1 = 1-4 employees, 2 = 5-9 
employees, etc. The latter 
data are also from Table 11.1. 


To illustrate the method, suppose we want to study the relationship between compensa¬ 
tion and employment size for the data presented in Table 11.1. For simplicity, we measure 
employment size by 1 (1-4 employees), 2 (5-9 employees), ... ,9 (1,000-2499 employ¬ 
ees), although we could also measure it by the midpoint of the various employment classes 
given in the table. 

Now letting Y represent average compensation per employee ($) and X the employ¬ 
ment size, we run the following regression (see Eq. [11.3.6]): 

Yi/cr, = p;0 /or,) + jSKX/M) + (Qi/oi) ( 11 . 6 . 1 ) 


where 07 are the standard deviations of wages as reported in Table 11.1. The necessary 
raw data to run this regression are given in Table 11.4. 


Compensation, 

Employment Size, 




Y 

X 

<Ti 

Yi/ai 

Xi/ffi 

3,396 

1 

742.2 

4.5664 

0.0013 

3,787 

2 

851.4 

4.4480 

0.0023 

4,013 

3 

727.8 

5.5139 

0.0041 

4,104 

4 

805.06 

5.0978 

0.0050 

4,146 

5 

929.9 

4.4585 

0.0054 

4,241 

6 

1,080.6 

3.9247 

0.0055 

4,387 

7 

1,241.2 

3.5288 

0.0056 

4,538 

8 

1,307.7 

3.4702 

0.0061 

4,843 

9 

1,110.7 

4.3532 

0.0081 

Note: In regression (11. 

6.2), the dependent variable is (Yj/cFi) a 

nd the independent vi 

ariables are (1/ <7,-) an 

d(X,/ov). 


Before going on to the regression results, note that Eq. (11.6.1) has no intercept term. 
(Why?) Therefore, one will have to use the regression-through-the-origin model to 
estimate ,8* and /i|, a topic discussed in Chapter 6. But most computer packages these 
days have an option to suppress the intercept term (see Minitab or EViews, for example). 
Also note another interesting feature of Eq. (11.6.1): It has two explanatory variables, 
(1 /a-,) and (X//07), whereas if we were to use OLS, regressing compensation on employ¬ 
ment size, that regression would have a single explanatory variable, X, . (Why?) 

The regression results of WLS are as follows: 

(yf/fi) = 3406.639(1 /07) + 154.153(X,/aj) 

(80.983) (16.959) ( 11 . 6 . 2 ) 

t= (42.066) (9.090) 

R 2 = 0.9993 33 

For comparison, we give the usual or unweighted OLS regression results: 

Yi = 3417.833 + 148.767 X, 

(81.136) (14.418) ( 11 . 6 . 3 ) 

f= (42.125) (10.318) R 2 = 0.9383 

In Exercise 11.7 you are asked to compare these two regressions. 


33 As noted in footnote 3 of Chapter 6, the R 2 of the regression through the origin is not directly 
comparable with the R 2 of the intercept-present model. The reported R 2 of 0.9993 takes this 
difference into account. (See the various packages for further details about how the R 2 is corrected to 
take into account the absence of the intercept term. See also Appendix 6A, Sec. 6A1.) 
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When of Is Not Known 

As noted earlier, if true of are known, we can use the WLS method to obtain BLUE estimators. 
Since the true of are rarely known, is there a way of obtaining consistent (in the statistical 
sense) estimates of the variances and covariances of OLS estimators even if there is het¬ 
eroscedasticity? The answer is yes. 

Whites Heteroscedasticity-Consistent Variances and Standard Errors 
White has shown that this estimate can be performed so that asymptotically valid (i.e., 
large-sample) statistical inferences can be made about the true parameter values. 34 We will 
not present the mathematical details, for they are beyond the scope of this book. However, 
Appendix 11 A.4 outlines White’s procedure. Nowadays, several computer packages pre¬ 
sent White’s heteroscedasticity-corrected variances and standard errors along with the 
usual OLS variances and standard errors. 35 Incidentally, White’s heteroscedasticity- 
corrected standard errors are also known as robust standard errors. 


EXAMPLE 11.8 As an example, we quote the following results due to Greene: 36 


illustration uj 

White’s Procedure 

Yi= 832.91 
OLS se = (327.3) 

- 1834.2 (Income) + 1587.04 (Income) 2 
(829.0) (519.1) 



t= (2.54) 

(2.21) 

(3.06) 

(11.6.4) 


White se = (460.9) 

(1243.0) 

(830.0) 



t= (1.81) 

(-1.48) 

(1.91) 



where Y = per capita expenditure on public schools by state in 1979 and Income = per 
capita income by state in 1979. The sample consisted of 50 states plus Washington, DC. 


As the preceding results show, (White’s) heteroscedasticity-corrected standard errors are 
considerably larger than the OLS standard errors and therefore the estimated t values are 
much smaller than those obtained by OLS. On the basis of the latter, both the regressors 
are statistically significant at the 5 percent level, whereas on the basis of White estimators 
they are not. However, it should be pointed out that White’s heteroscedasticity-corrected 
standard errors can be larger or smaller than the uncorrected standard errors. 

Since White’s heteroscedasticity-consistent estimators of the variances are now avail¬ 
able in established regression packages, it is recommended that the reader report them. As 
Wallace and Silver note: 

Generally speaking, it is probably a good idea to use the WHITE option [available in regres¬ 
sion programs] routinely, perhaps comparing the output with regular OLS output as a check to 
see whether heteroscedasticity is a serious problem in a particular set of data. 37 

Plausible Assumptions about Heteroscedasticity Pattern 

Apart from being a large-sample procedure, one drawback of the White procedure is that 
the estimators thus obtained may not be so efficient as those obtained by methods that 

34 See H. White, op. cit. 

35 More technically, they are known as heteroscedasticity-consistent covariance matrix 
estimators. 

36 William H. Greene, Econometric Analysis, 2d ed., Macmillan, New York, 1993, p. 385. 

37 T. Dudley Wallace and J. Lew Silver, Econometrics: An Introduction, Addison-Wesley, Reading, Mass., 
1988, p. 265. 
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transform data to reflect specific types of heteroscedasticity. To illustrate this, let us revert 
to the two-variable regression model: 

Yi = h+h.Xi+m 

We now consider several assumptions about the pattern of heteroscedasticity. 

ASSUMPTION 1 The error variance is proportional to X 2 : 

E (u 2 ) = <r 2 X 2 (11.6.5)* 8 


If, as a matter of “speculation,” graphical methods, or Park and Glejser approaches, it is 
believed that the variance of u, is proportional to the square of the explanatory variable X 
(see Figure 11.10), one may transform the original model as follows. Divide the original 
model through by X ,: 


Yj 

Yi 


A , O , U i 

Yi +Pl + Yi 


= fhy+fh + Vi 


( 11 . 6 . 6 ) 


where v, is the transformed disturbance term, equal to m,- / X t . Now it is easy to verify that 



— a 2 using (11.6.5) 


Hence the variance of v, is now homoscedastic, and one may proceed to apply OLS to the 
transformed equation (11.6.6), regressing T//X, on 1 /X t . 


FIGURE 11.10 

Error variance 
proportional to X 1 . 



X 


38 Recall that we have already encountered this assumption in our discussion of the Goldfeld-Quandt 
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Notice that in the transformed regression the intercept term is the slope coefficient in 
the original equation and the slope coefficient fix is the intercept term in the original model. 
Therefore, to get back to the original model we shall have to multiply the estimated 
Eq. (11.6.6) by X,. An application of this transformation is given in Exercise 11.20. 


ASSUMPTION 2 The error variance is proportional to X;. The square root transformation: 

f(u,. 2 ) = <x 2 X, (11.6.7) 


If it is believed that the variance of u,, instead of being proportional to the squared X,, 
is proportional to X t itself, then the original model can be transformed as follows (see Fig¬ 
ure 11.11): 


Yt 


fh 


+ + 




— P\ -j= + fhy/Xi + V, 


( 11 . 6 . 8 ) 


where v, = /\fX and where X t > 0. 

Given assumption 2, one can readily verify that E(vf) — a 2 , a homoscedastic situation. 
Therefore, one may proceed to apply OLS to Eq. (11.6.8), regressing Yj/^/Xi on i A/*i 
and s/X,. 

Note an important feature of the transformed model: It has no intercept term. Therefore, 
one will have to use the regression-through-the-origin model to estimate P\ and fn- Having 
run Eq. (11.6.8), one can get back to the original model simply by multiplying Eq. (11.6.8) 
by JXi. 

An interesting case is the zero intercept model, namely, Y t = fcXj + Ui. In this case, 
Eq. (11.6.8) becomes: 

7T = h ^ + 7T, (,, ' 6 ' 8a) 


FIGURE 11.11 

Error variance 
proportional to X. 



X 
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And it can be shown that 

02 = j (11.6.8b) 

That is, the weighted least-squares estimator is simply the ratio of the means of the depen¬ 
dent and explanatory variables. (To prove Eq. [ 11,6.8b], just apply the regression-through- 
the-origin formula given in Eq. [6.1.6].) 


ASSUMPTION 3 The error variance is proportional to the square of the mean value of Y. 

£(u?)=<r 2 [f(f ; )] 2 (11.6.9) 


Equation (11.6.9) postulates that the variance of u, is proportional to the square of the 
expected value of Y (see Figure 11.8e). Now 

E(Yi) = p x + p 2 Xt 

Therefore, if we transform the original equation as follows, 

Y t _ Pi Xi Ui 

E(Yi) ~ E(Yi ) + ^ 2 E{Yi) + E(Y{) 

( 11 . 6 . 10 ) 

=pi G^) +a £§ 9 +v ‘ 

where v,- = Uj/E(Y,), it can he seen that E(vj) — cr 2 ; that is, the disturbances v, are ho- 
moscedastic. Hence, it is regression (11.6.10) that will satisfy the homoscedasticity as¬ 
sumption of the classical linear regression model. 

The transformation (11.6.10) is, however, inoperational because E(Y t ) depends on fi\ 
and p 2 , which are unknown. Of course, we know % = f}\ + p 2 X t , which is an estimator of 
E(Yi). Therefore, we may proceed in two steps: First, we run the usual OLS regression, dis¬ 
regarding the heteroscedasticity problem, and obtain Y t . Then, using the estimated Y f , we 
transform our model as follows: 


1 =ft (l) +fc (f) +Vi (,1 ' 6 ' 1,) 


where vt = (Uj/Yj). In Step 2, we run the regression (11.6.11). Although Y, are not exactly 
E(Yi), they are consistent estimators; that is, as the sample size increases indefinitely, they 
converge to true E( Y, '). Hence, the transformation (11.6.11) will perform satisfactorily in 
practice if the sample size is reasonably large. 


ASSUMPTION 4 A log transformation such as 

In Yj = Pi + p 2 In Xi + Ui (11.6.12) 

very often reduces heteroscedasticity when compared with the regression 

H = Pt + PlXi + Uj. 
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This result arises because log transformation compresses the scales in which the vari¬ 
ables are measured, thereby reducing a tenfold difference between two values to a twofold 
difference. Thus, the number 80 is 10 times the number 8, but In 80 (= 4.3280) is about 
twice as large as In 8 (= 2.0794). 

An additional advantage of the log transformation is that the slope coefficient /J 2 mea¬ 
sures the elasticity of Y with respect to A that is, the percentage change in Y for a percent¬ 
age change in X. For example, if 7is consumption and Ais income, fi 2 in Eq. (11.6.12) will 
measure income elasticity, whereas in the original model fi 2 measures only the rate of 
change of mean consumption for a unit change in income. It is one reason why the log 
models are quite popular in empirical econometrics. (For some of the problems associated 
with log transformation, see Exercise 11.4.) 

To conclude our discussion of the remedial measures, we reemphasize that all the 
transformations discussed previously are ad hoc; we are essentially speculating about 
the nature of er 2 . Which of the transformations discussed previously will work will depend 
on the nature of the problem and the severity of heteroscedasticity. There are some 
additional problems with the transformations we have considered that should be borne 
in mind: 

1. When we go beyond the two-variable model, we may not know a priori which of the 
X variables should be chosen for transforming the data. 39 

2. Log transformation as discussed in Assumption 4 is not applicable if some of the Y 
and A values are zero or negative. 40 

3. Then there is the problem of spurious correlation. This term, due to Karl Pearson, 
refers to the situation where correlation is found to be present between the ratios of vari¬ 
ables even though the original variables are uncorrelated or random. 41 Thus, in the model 
Yi = fS\+ foXi + Ui, Y and A may not be correlated but in the transformed model 
Yi/Xi = /j|(l/A,) + fi 2 , Yi/Xi and 1/A, are often found to be correlated. 

4. When <yf are not directly known and are estimated from one or more of the trans¬ 
formations that we have discussed earlier, all our testing procedures using the t tests, 
F tests, etc., are, strictly speaking, valid only in large samples. Therefore, one has to be 
careful in interpreting the results based on the various transformations in small or finite 
samples 42 


11.7 Concluding Examples 

In concluding our discussion of heteroscedasticity we present three examples illustrating 
the main points made in this chapter. 


39 However, as a practical matter, one may plot uf against each variable and decide which X variable 
may be used for transforming the data. (See Fig. 11.9.) 

40 Sometimes we can use In (Y,+ k) or In (X,+ k), where k is a positive number chosen in such a way 
that all the values of Y and X become positive. 

41 For example, if Xi, X 2 , and Xb are mutually uncorrelated o 2 = r 13 = r 23 = 0 and we find that the 
(values of the) ratios Xi/X 3 and X 2 /X 3 are correlated, then there is spurious correlation. "More gener¬ 
ally, correlation may be described as spurious if it is induced by the method of handling the data and 
is not present in the original material." M. C. Kendall and W. R. Buckland, A Dictionary of Statistical 
Terms, Hafner Publishing, New York, 1972, p. 143. 

42 For further details, see George G. Judge et al., op. cit., Section 14.4, pp. 415-420. 
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EXAMPLE 11.9 

Child Mortality 
Revisited 


Let us return to the child mortality example we have considered on several occasions. From 
data for 64 countries, we obtained the regression results shown in Eq. (8.1.4). Since the data 
are cross-sectional, involving diverse countries with different child mortality experiences, it 
is likely that we might encounter heteroscedasticity. To find this out, let us first consider 
the residuals obtained from Eq. (8.1.4). These residuals are plotted in Figure 11.12. From 
this figure it seems that the residuals do not show any distinct pattern that might suggest 
heteroscedasticity. Nonetheless, appearances can be deceptive. So, let us apply the Park, 
Clejser, and White tests to see if there is any evidence of heteroscedasticity. 


Park Test. Since there are two regressors, GNP and FLR, we can regress the squared resid¬ 
uals from regression (8.1.4) on either of these variables. Or, we can regress them on the 
estimated CM values (= CM) from regression (8.1.4). Using the latter, we obtained the fol¬ 
lowing results. 

<3? = 854.4006 + 5.7016 CM, 

t= (1.2010) (1.2428) r 2 = 0.024 (11.7.1) 


Note:Qi are the residuals obtained from regression (8.1.4) and CM are the estimated values 
of CM from regression (8.1.4). 

As this regression shows, there is no systematic relation between the squared residuals 
and the estimated CM values (why?), suggesting that the assumption of homoscedastic- 
ity may be valid. Incidentally, regressing the log of the squared residual values on the log 
of CM did not change the conclusion. 

Glejser Test. The absolute values of the residual obtained from Eq. (8.1.4), when re¬ 
gressed on the estimated CM value from the same regression, gave the following results: 


1(3,1 =22.3127 + 0.0646 CM, 
t = (2.8086) (1.2622) r 2 = 0.0250 


(11.7.2) 


Again, there is not much systematic relationship between the absolute values of the resid¬ 
uals and the estimated CM values, as the t value of the slope coefficient is not statistically 
significant. 

White Test. Applying White's heteroscedasticity test with and without cross-product 
terms, we did not find any evidence of heteroscedasticity. We also reestimated Eq. (8.1.4) 
to obtain White's heteroscedasticity-consistent standard errors and t values, but the results 
were quite similar to those given in Eq. (8.1.4), which should not be surprising in view of 
the various heteroscedasticity tests we conducted earlier. 

In sum, it seems that our child mortality regression (8.1.4) does not suffer from 
heteroscedasticity. 


FIGURE 11.12 

Residuals from 
regression (8.1.4). 
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Chapter 11 Heteroscedasticity: What Happens If the Error Variance Is Nonconstant? 397 


EXAMPLE 11.10 

R&D 

Expenditure, 

Sales, and Profits 
in 14 Industry 
Groupings in the 
United States, 


Table 11.5 gives data on research and development (R&D) expenditure, sales, and profits 
for 14 industry groupings in the United States (all figures in millions of dollars). Since the 
cross-sectional data presented in this table are quite heterogeneous, in a regression of 
R&D on sales, heteroscedasticity is likely. The regression results are as follows: 

R&D; = 1338 + 0.0437 Sales, 

se = (5015) (0.0277) (11.7.3) 

t= (0.27) (1.58) r 2 = 0.172 


Not surprisingly, there is a positive relationship between R&D and sales, although it is not 
statistically significant at the traditional levels. 


TABLE 11.5 

Sales and 

Industry 

Sales 

R&D 

Profits 

Employment 

1 Food 

374,342 

2,716 

234,662 

for Companies 

2 Textiles, apparel, and leather 

51,639 

816 

53,510 

Performing 

3 Basic chemicals 

109,899 

2,277 

75,168 

Industrial R&D 
in the United States, 

4 Resin, synthetic rubber, fibers, 
and filament 

132,934 

2,294 

34,645 

by Industry, 2005 

5 Pharmaceuticals and medicines 

273,377 

34,839 

127,639 

(values are in 

6 Plastics and rubber products 

90,1 76 

1,760 

96,162 

millions of dollars) 

7 Fabricated metal products 

174,165 

1,375 

155,801 


8 Machinery 

230,941 

8,531 

143,472 

Som-o^Niinonii 1 Science 

9 Computers and peripheral equipment 

91,010 

4,955 

34,004 

Science Resources Statistics, 
Survey of Industrial Research 

10 Semiconductor and other 
electronic components 

176,054 

18,724 

81,317 

and Development: 2005 and 
the U.S. Census Bureau 

Annual Survey of 

11 Navigational, measuring, electromedical, 
and control instruments 

118,648 

15,204 

73,258 

Manufacturers, 2005. 

12 Electrical equipment, appliances, 
and components 

101,398 

2,424 

54,742 


13 Aerospace products and parts 

227,271 

15,005 

72,090 


14 Medical equipment and supplies 

56,661 

4,374 

52,443 


To see if the regression (11.7.3) suffers from heteroscedasticity, we obtained the resid¬ 
uals, ui, and the squared residuals, uf, from the model and plotted them against sales, as 
shown in Figure 11.13. It seems from this figure that there is a systematic pattern between 
the residuals and squared residuals and sales, perhaps suggesting that there is het¬ 
eroscedasticity. To test this formally, we used the Park, Glejser, and White tests, which gave 
the following results: 

Park Test 

uf = -72,493,719 + 916.1 Sales, 

se = (54,940,238) (303.9) (11.7.4) 

t= (-1.32) (3.01) r 2 = 0.431 

The Park test suggests that there is a statistically significant positive relationship between 
squared residuals and sales. 


( Continued) 
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EXAMPLE 11.10 

0 Continued) 


FIGURE 11.13 Residuals (a) and squared residuals ( b) on sales. 
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Glejser Test 

\Qi\= -1003 + 0.04639 Sales, 
se = (2316) (0.0128) 
t = (—0.43) (3.62) r 2 = 0.522 


The Glejser test also suggests that there is a systematic relationship between the absolute 
values of the residuals and sales, raising the possibility that the regression (11.7.3) suffers 
from heteroscedasticity. 


White Test 


uf = -46,746,325 + 
se = (112,224,348) 
f.- (-0.42) 


578 Sales, 
(1308) 
(0.44) 


+ 0.000846 Sales, 2 
(0.003171) 

(0.27) 

/? 2 = 0.435 


(11.7.6) 


Using the R 2 value and n = 14, we obtain nR 2 = 6.090. Under the null hypothesis of no 
heteroscedasticity, this should follow a chi-square distribution with 2 df (because there are 
two regressors in Eq. [11.7.6]). The p value of obtaining a chi-square value of as much as 
6.090 or greater is about 0.0476. Since this is a low value, the White test also suggests that 
there is heteroscedasticity. 
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EXAMPLE 11.10 

0 Continued) 


In sum, then, on the basis of the residual graphs and the Park, Glejser, and White tests, 
it seems that our R&D regression (11.7.3) suffers from heteroscedasticity. Since the true 
error variance is unknown, we cannot use the method of weighted least squares to obtain 
heteroscedasticity-corrected standard errors and t values. Therefore, we would have to 
make some educated guesses about the nature of the error variance. 

To conclude our example, we present below White's heteroscedasticity-consistent 
standard errors, as discussed in Section 11.6. 


R & D; = 1337.87 + 0.0437 Sales, 

se = (4892.447) (0.0411) (11.7.7) 

f= (0.27) (1.06) r 2 = 0.172 

Comparing Eq. (11.7.7) with Eq. (11.7.3) (the latter not having been corrected for 
heteroscedasticity), we see that the parameter estimates have not changed (as we 
would expect), the standard error of the intercept coefficient has decreased slightly, 
and the standard error of the slope coefficient has increased slightly. But remember 
that the White procedure is strictly a large-sample procedure, whereas we have only 
14 observations. 


EXAMPLE11.il Table 11.6 on the textbook website provides salary and related data on 94 school districts 
in Northwest Ohio. Initially, the following regression was estimated from these data: 

In(Salary); = p-\ + p 2 In(Famincome) + In(Propvalue) + u; 

Where Salary = mean salary of classroom teachers ($), famincome = mean family income 
in the district ($), and propvalue = mean property value in the district ($). 

Since this is a double-log model, all the slope coefficients are elasticities. Based on the 
various heteroscedasticity tests discussed in the text, it was found that the preceding 
model suffered from heteroscedasticity. We, therefore, obtained (White's) robust standard 
errors. The following table gives the results of the preceding regression with and without 
robust standard errors. 


Variable 

Coefficient 

OLS se 

Robust se 

Intercept 

7.0198 

0.8053 

0.7721 

ln(famincome) 

0.2575 

(8.71 71) 
0.0799 

(9.0908) 

0.1009 

In(propvalue) 

0.0704 

(3.2230) 

0.0207 

(2.5516) 

0.0460 

R 2 

0.2198 

(3.3976) 

(1.5311) 


Note: Figures in parentheses are the estimated t ratios. 


Although the coefficient values and R 2 remain the same whether we use OLS or 
White's method, the standard errors have changed; the most dramatic change is in the 
standard error of the In(propvalue) coefficient. The usual OLS would suggest that the es¬ 
timated coefficient of this variable is highly statistically significant, whereas White's robust 
standard error suggests that this coefficient is not significant even at the 10 percent level. 
The point of this example is that if there is heteroscedasticity, we should take it into 
account in estimating a model. 







400 Part Two Relaxing the Assumptions of the Classical Model 


11.8 A Caution about Overreacting to Heteroscedasticity 



Reverting to the R&D example discussed in the previous section, we saw that when we 
used the square root transformation to correct for heteroscedasticity in the original model 
(11.7.3), the standard error of the slope coefficient decreased and its t value increased. Is 
this change so significant that one should worry about it in practice? To put the matter dif¬ 
ferently, when should we really worry about the heteroscedasticity problem? As one author 
contends, “heteroscedasticity has never been a reason to throw out an otherwise good 
model.” 43 

Here it may be useful to bear in mind the caution sounded by John Fox: 

... unequal error variance is worth correcting only when the problem is severe. 

The impact of nonconstant error variance on the efficiency of ordinary least-squares 
estimator and on the validity of least-squares inference depends on several factors, includ¬ 
ing the sample size, the degree of variation in the of, the configuration of the X [i.e., 
regressor] values, and the relationship between the error variance and the A’s. It is therefore 
not possible to develop wholly general conclusions concerning the harm produced by 
heteroscedasticity. 44 

Returning to the model (11.3.1), we saw earlier that variance of the slope estimator, var 
(^2), is given by the usual formula shown in (11.2.3). Under GLS the variance of the slope 
estimator, var (/0|), is given by (11.3.9). We know that the latter is more efficient than the 
former. But how large does the former (i.e., OLS) variance have to he in relation to the GLS 
variance before one should really worry about it? As a rule of thumb, Fox suggests that we 
worry about this problem “. . . when the largest error variance is more than about 10 times 
the smallest.” 45 Thus, returning to the Monte Carlo simulations results of Davidson and 
MacKinnon presented in Section 11.4, consider the value of a — 2. The variance of the 
estimated fo is 0.04 under OLS and 0.012 under GLS, the ratio of the former to the latter 
thus being about 3.33. 46 According to the Fox rule, the severity of heteroscedasticity in this 
case may not be large enough to worry about. 

Also remember that, despite heteroscedasticity, OLS estimators are linear unbiased and 
are (under general conditions) asymptotically (i.e., in large samples) normally distributed. 

As we will see when we discuss other violations of the assumptions of the classical 
linear regression model, the caution sounded in this section is appropriate as a general rule. 
Otherwise, one can go overboard. 

Summary and 
Conclusions 

1. A critical assumption of the classical linear regression model is that the distur¬ 
bances Ui have all the same variance, cr 2 . If this assumption is not satisfied, there is 
heteroscedasticity. 

2. Heteroscedasticity does not destroy the unbiasedness and consistency properties of OLS 
estimators. 

3. But these estimators are no longer minimum variance or efficient. That is, they are not 
BLUE. 

43 N. Gregory Mankiw, "A Quick Refresher Course in Macroeconomics," journal of Economic Literature, 

vol. XXVIII, December 1990, p. 1648. 

44 John Fox, Applied Regression Analysis, Linear Models, and Related Methods, Sage Publications, 

California, 1997, p. 306. 

45 lbid., p. 307. 

46 Note that we have squared the standard errors to obtain the variances. 
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4. The BLUE estimators are provided by the method of weighted least squares, provided 
the heteroscedastic error variances, of, are known. 

5. In the presence of heteroscedasticity, the variances of OLS estimators are not provided 
by the usual OLS formulas. But if we persist in using the usual OLS formulas, the t and 
F tests based on them can be highly misleading, resulting in erroneous conclusions. 

6. Documenting the consequences of heteroscedasticity is easier than detecting it. There 
are several diagnostic tests available, but one cannot tell for sure which will work in a 
given situation. 

7. Even if heteroscedasticity is suspected and detected, it is not easy to correct the problem. 
If the sample is large, one can obtain White’s heteroscedasticity-corrected standard er¬ 
rors of OLS estimators and conduct statistical inference based on these standard errors. 

8. Otherwise, on the basis of OLS residuals, one can make educated guesses of the likely 
pattern of heteroscedasticity and transform the original data in such a way that in the 
transformed data there is no heteroscedasticity. 


Questions 

11.1. State with brief reason whether the following statements are true, false, or uncertain: 

a. In the presence of heteroscedasticity OLS estimators are biased as well as 
inefficient. 

b. If heteroscedasticity is present, the conventional t and F tests are invalid. 

c. In the presence of heteroscedasticity the usual OLS method always overesti¬ 
mates the standard errors of estimators. 

d. If residuals estimated from an OLS regression exhibit a systematic pattern, it 
means heteroscedasticity is present in the data. 

e. There is no general test of heteroscedasticity that is free of any assumption about 
which variable the error term is correlated with. 

f If a regression model is mis-specified (e.g., an important variable is omitted), the 
OLS residuals will show a distinct pattern. 

g. If a regressor that has nonconstant variance is (incorrectly) omitted from a 
model, the (OLS) residuals will be heteroscedastic. 

11.2. In a regression of average wages (W, $) on the number of employees ( N) for a 

random sample of 30 firms, the following regression results were obtained:* 

W= 7.5+ 0.001W 

t = n.a. (16.10) R 2 — 0.90 
W/N= 0.008+ 7.8(1/TV) 

*=(14.43) (76.58) R 2 = 0.99 

a. How do you interpret the two regressions? 

b. What is the author assuming in going from Eq. (1) to Eq. (2)? Was he worried 
about heteroscedasticity? How do you know? 

c. Can you relate the slopes and intercepts of the two models? 

d. Can you compare the R 2 values of the two models? Why or why not? 

'See Dominick Salvatore, Managerial Economics, McGraw-Hill, New York, 1989, p. 157. 
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11.3. a. Can you estimate the parameters of the models 

I a,-i = Vft + A*» + Vi 
mmJfr + frXf + Vi 

by the method of ordinary least squares? Why or why not? 
b. If not, can you suggest a method, informal or formal, of estimating the parame¬ 
ters of such models? (See Chapter 14.) 

11.4. Although log models as shown in Eq. (11.6.12) often reduce heteroscedasticity, one 
has to pay careful attention to the properties of the disturbance term of such mod¬ 
els. For example, the model 

Yi — fiiXf 2 Ui (1) 

can be written as 

In Yt = In p! + In X t + In Ui (2) 

a. If In ut is to have zero expectation, what must be the distribution of up. 

b. If E(ui) = 1, will £(ln ui) = 0? Why or why not? 

c. If C(ln u^) is not zero, what can be done to make it zero? 

11.5. Show that of Eq. (11.3.8) can also be expressed as 



and var (/J|) given in Eq. (11.3.9) can also be expressed as 



where y* = Y t — Y* and x* = X, — X* represent deviations from the weighted 
means Y* and X* defined as 

11.6. For pedagogic purposes Hanushek and Jackson estimate the following model: 

C t = p i + /3 2 GNP, + pjDt +Ui (1) 

where C, — aggregate private consumption expenditure in year t, GNP ; = gross 
national product in year t, and D — national defense expenditures in year t, the 
objective of the analysis being to study the effect of defense expenditures on other 
expenditures in the economy. 

Postulating that a} — er 2 (GNP ( ) 2 , they transform (1) and estimate 


Q/GNP, = (l/GNP t ) + p 2 + p3 (A/GNP,) + m,/GNP ; (2) 
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The empirical results based on the data for 1946-1975 were as follows (standard 
errors in the parentheses):* 

C, = 26.19 + 0.6248 GNP, - 0.4398 D t 

(2.73) (0.0060) (0.0736) R 2 = 0.999 

QjGNPt = 25.92 (1/GNP<) + 0.6246 - 0.4315 (A/GNP,) 

(2.22) (0.0068) (0.0597) R 2 = 0.875 

a. What assumption is made by the authors about the nature of heteroscedasticity? 
Can you justify it? 

b. Compare the results of the two regressions. Has the transformation of the origi¬ 
nal model improved the results, that is, reduced the estimated standard errors? 
Why or why not? 

c. Can you compare the two R 2 values? Why or why not? {Hint: Examine the 
dependent variables.) 

11.7. Refer to the estimated regression in Eqs. (11.6.2) and (11.6.3). The regression 
results are quite similar. What could account for this outcome? 

11.8. Prove that if w, = w, a constant, for each i, and as well as their variance are 
identical. 

11.9. Refer to formulas (11.2.2) and (11.2.3). Assume 

of - a 2 ki 

where a 2 is a constant and where k t are known weights, not necessarily all equal. 

Using this assumption, show that the variance given in Eq. (11.2.2) can be 
expressed as 


var (ft) 


£*, 2 £*? 


The first term on the right side is the variance formula given in Eq. (11.2.3), that 
is, var (ft) under homoscedasticity. What can you say about the nature of the rela¬ 
tionship between var (ft) under heteroscedasticity and under homoscedasticity? 
{Hint: Examine the second term on the right side of the preceding formula.) Can 
you draw any general conclusions about the relationships between Eqs. (11.2.2) 
and (11.2.3)? 

11.10. In the model 


Y t = foXj + Ui {Note: there is no intercept) 
you are told that var(« ; ) = a 2 X 2 . Show that 


var (ft) = 


(£ J ?) 2 


'Erie A. Hanushek and John E. Jackson, Statistical Methods for Social Scientists, Academic, New York, 
1977, p. 160. 
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TABLE 11.6 

Asset Size (millions 
of dollars) 

Source: Quarterly Financial 

Corporations, Federal Trade 
Commission and the Securities 
and Exchange Commission, 
U.S. government, various issues 


Empirical Exercises 

11.11. For the data given in Table 11.1, regress average compensation Y on average 
productivity X, treating employment size as the unit of observation. Interpret your 
results, and see if your results agree with those given in Eq. (11.5.3). 

a. From the preceding regression obtain the residuals w,. 

b. Following the Park test, regress In if on In X t and verify the regression 
Eq. (11.5.4). 

c. Following the Glejser approach, regress |ii, | on X, and then regress w, | on ^/X~ t 
and comment on your results. 

d. Find the rank correlation between \ u, | and X, and comment on the nature of het- 
eroscedasticity, if any, present in the data. 


11.12. Table 11.6 gives data on the sales/cash ratio in U.S. manufacturing industries classi¬ 
fied by the asset size of the establishment for the period 1971-1 to 1973-IV (The data 
are on a quarterly basis.) The sales/cash ratio may be regarded as a measure of in¬ 
come velocity in the corporate sector, that is, the number of times a dollar turns over. 

a. For each asset size compute the mean and standard deviation of the sales/cash ratio. 

b. Plot the mean value against the standard deviation as computed in (a), using asset 
size as the unit of observation. 

c. By means of a suitable regression model decide whether standard deviation of the 
ratio increases with the mean value. If not, how would you rationalize the result? 

d. If there is a statistically significant relationship between the two, how would you 
transform the data so that there is no heteroscedasticity? 

11.13. Bartlett’s homogeneity-of-variance test* Suppose there are k independent sample 

variances s\, sf,..., sf with /i, df, each from populations which are 

normally distributed with mean ji and variance of. Suppose further that we want 
to test the null hypothesis i/ 0 : of = a\ — ■ ■ ■ — of = a 2 ; that is, each sample vari¬ 
ance is an estimate of the same population variance a 2 . 

If the null hypothesis is true, then 


jbfi4 


E.M 2 

/ 


Year and 








Quarter 

1-10 

10-25 

25-50 

50-100 

100-250 

250-1,000 

1,000 + 

1971-1 

6.696 

6.929 

6.858 

6.966 

7.819 

7.557 

7.860 

-II 

6.826 

7.311 

7.299 

7.081 

7.907 

7.685 

7.351 

-III 

6.338 

7.035 

7.082 

7.145 

7.691 

7.309 

7.088 

-IV 

6.272 

6.265 

6.874 

6.485 

6.778 

7.120 

6.765 

1972-1 

6.692 

6.236 

7.101 

7.060 

7.104 

7.584 

6.717 

-II 

6.818 

7.010 

7.719 

7.009 

8.064 

7.457 

7.280 

-III 

6.783 

6.934 

7.182 

6.923 

7.784 

7.142 

6.619 

-IV 

6.779 

6.988 

6.531 

7.146 

7.279 

6.928 

6.919 

1973-1 

7.291 

7.428 

7.272 

7.571 

7.583 

7.053 

6.630 

-II 

7.766 

9.071 

7.818 

8.692 

8.608 

7.571 

6.805 

-III 

7.733 

8.357 

8.090 

8.357 

7.680 

7.654 

6.772 

-IV 

8.316 

7.621 

7.766 

7.867 

7.666 

7.380 

7.072 


*See "Properties of Sufficiency and Statistical Tests," Proceedings of the Royal Society of London A, 
vol. 160. 1937. o. 268. 
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provides an estimate of the common (pooled) estimate of the population variance 
a 2 , where f = (m — 1), «, being the number of observations in the zth group and 
where / = f- 

Bartlett has shown that the null hypothesis can be tested by the ratio A/B, 
which is approximately distributed as the x 2 distribution with k— 1 df, where 

A = fins 2 - (f Ins 2 ) 
and 

Apply Bartlett’s test to the data of Table 11.1 and verify that the hypothesis that 
population variances of employee compensation are the same in each employment 
size of the establishment cannot be rejected at the 5 percent level of significance. 

Note: f, the df for each sample variance, is 9, since n, for each sample (i.e., 
employment class) is 10. 

11.14. Consider the following regression-through-the origin model: 

Yi = pXi + Ui , fori =1,2 

You are told that u\ ~ N( 0, a 2 ) and 112 ~ N( 0,2er 2 ) and that they are statistically 
independent. If X\ = +1 and X 2 = —1, obtain the weighted least-squares (WLS) 
estimate of f and its variance. If in this situation you had assumed incorrectly that 
the two error variances were the same (say, equal to a 2 ), what would be the OLS 
estimator of ft And its variance? Compare these estimates with the estimates 
obtained by the method of WLS. What general conclusion do you draw?* 

11.15. Table 11.7 gives data on 81 cars about MPG (average miles per gallons), HP (en¬ 
gine horsepower), VOL (cubic feet of cab space), SP (top speed, miles per hour), 
and WT (vehicle weight in 100 lbs.). 

a. Consider the following model: 

MPG, = p l + &SP, + ftHP, + /3 4 WT , + Ui 

Estimate the parameters of this model and interpret the results. Do they make 
economic sense? 

b. Would you expect the error variance in the preceding model to be heteroscedas- 
tic? Why? 

c. Use the White test to find out if the error variance is heteroscedastic. 

d. Obtain White’s heteroscedasticity-consistent standard errors and t values and 
compare your results with those obtained from OLS. 

e. If heteroscedasticity is established, how would you transform the data so that in 
the transformed data the error variance is homoscedastic? Show the necessary 
calculations. 

11.16. Food expenditure in India. In Table 2.8 we have given data on expenditure on food 
and total expenditure for 55 families in India. 

a. Regress expenditure on food on total expenditure, and examine the residuals 
obtained from this regression. 

b. Plot the residuals obtained in (a) against total expenditure and see if you observe 
any systematic pattern. 

'Adapted from F. A. F. Seber, Linear Regression Analysis, John Wiley St Sons, New York, 1977, p. 64. 
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TABLE 11.7 Passenger Car Mileage Data 


Observation MPG 


SP HP VOL WT 


Observation MPG SP 


HP VOL WT 


1 65.4 96 

2 56.0 97 

3 55.9 97 

4 49.0 105 

5 46.5 96 

6 46.2 105 

7 45.4 97 

8 59.2 98 

9 53.3 98 

10 43.4 107 

11 41.1 103 

12 40.9 113 

13 40.9 113 

14 40.4 103 

15 39.6 100 

16 39.3 103 

17 38.9 106 

18 38.8 113 

19 38.2 106 

20 42.2 109 

21 40.9 110 

22 40.7 101 

23 40.0 111 

24 39.3 105 

25 38.8 111 

26 38.4 110 

27 38.4 110 

28 38.4 110 

29 46.9 90 

30 36.3 112 

31 36.1 103 

32 36.1 103 

33 35.4 111 

34 35.3 111 

35 35.1 102 

36 35.1 106 

37 35.0 106 

38 33.2 109 

39 32.9 109 

40 32.3 120 

41 32.2 106 


49 89 17.5 

55 92 20.0 

55 92 20.0 

70 92 20.0 

53 92 20.0 

70 89 20.0 

55 92 20.0 

62 50 22.5 

62 50 22.5 

80 94 22.5 

73 89 22.5 

92 50 22.5 

92 99 22.5 

73 89 22.5 

66 89 22.5 

73 89 22.5 

78 91 22.5 

92 50 22.5 

78 91 22.5 

90 103 25.0 

92 99 25.0 

74 107 25.0 

95 101 25.0 

81 96 25.0 

95 89 25.0 

92 50 25.0 

92 117 25.0 

92 99 25.0 

52 104 27.5 

103 107 27.5 

84 114 27.5 

84 101 27.5 

102 97 27.5 

102 113 27.5 

81 101 27.5 

90 98 27.5 

90 88 27.5 

102 86 30.0 

102 86 30.0 

130 92 30.0 

95 113 30.0 


42 32.2 106 

43 32.2 109 

44 32.2 106 

45 31.5 105 

46 31.5 108 

47 31.4 108 

48 31.4 107 

49 31.2 120 

50 33.7 109 

51 32.6 109 

52 31.3 109 

53 31.3 109 

54 30.4 133 

55 28.9 125 

56 28.0 115 

57 28.0 102 

58 28.0 109 

59 28.0 104 

60 28.0 105 

61 27.7 120 

62 25.6 107 

63 25.3 114 

64 23.9 114 

65 23.6 117 

66 23.6 122 

67 23.6 122 

68 23.6 122 

69 23.6 122 

70 23.5 148 

71 23.4 160 

72 23.4 121 

73 23.1 121 

74 22.9 110 

75 22.9 110 

76 19.5 121 

77 18.1 165 

78 17.2 140 

79 17.0 147 

80 16.7 157 

81 13.2 130 


95 106 30.0 

102 92 30.0 

95 88 30.0 

93 102 30.0 

100 99 30.0 

100 111 30.0 

98 103 30.0 

130 86 30.0 

115 101 35.0 

115 101 35.0 

115 101 35.0 

115 124 35.0 

180 113 35.0 

160 113 35.0 

130 124 35.0 

96 92 35.0 

115 101 35.0 

100 94 35.0 

100 115 35.0 

145 111 35.0 

120 116 40.0 

140 131 40.0 

140 123 40.0 

150 121 40.0 

165 50 40.0 

165 114 40.0 

165 127 40.0 

165 123 40.0 

245 112 40.0 

280 50 40.0 

162 135 40.0 

162 132 40.0 

140 160 45.0 

140 129 45.0 

175 129 45.0 

322 50 45.0 

238 115 45.0 

263 50 45.0 

295 119 45.0 

236 107 55.0 




VOL = cubic feet of cab space. 

HP = engine horsepower. 

MPG = average miles per gallon. 

SP = top speed, miles per hour. 

WT = vehicle weight, hundreds of pounds. 

Observation = car observation number (Names of cars not disclosed). 


Source: U.S. Environmental Protection Agency, 1991, Report EPA/AA/CTAB/91-02. 
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c. If the plot in (b) suggests that there is heteroscedasticity, apply the Park, Glejser, 
and White tests to find out if the impression of heteroscedasticity observed in ( b ) 
is supported by these tests. 

d. Obtain White’s heteroscedasticity-consistent standard errors and compare those 
with the OLS standard errors. Decide if it is worth correcting for heteroscedas¬ 
ticity in this example. 

11.17. Repeat Exercise 11.16, but this time regress the logarithm of expenditure on food 
on the logarithm of total expenditure. If you observe heteroscedasticity in the linear 
model of Exercise 11.16 but not in the log-linear model, what conclusion do you 
draw? Show all the necessary calculations. 

11.18. A shortcut to White’s test. As noted in the text, the White test can consume degrees 
of freedom if there are several regressors and if we introduce all the regressors, 
their squared terms, and their cross products. Therefore, instead of estimating 
regressions like Eq. (11.5.22), why not simply run the following regression: 

u 2 = a i + a 2 % + a 2 Y 2 + v t 

where Y, are the estimated Y (i.e., regressand) values from whatever model you are 
estimating? After all, Y x is simply the weighted average of the regressors, with the 
estimated regression coefficients serving as the weights. 

Obtain the R 2 value from the preceding regression and use Eq. (11.5.22) to test 
the hypothesis that there is no heteroscedasticity. 

Apply the preceding test to the food expenditure example of Exercise 11.16. 

11.19. Return to the R&D example discussed in Section 11.7 (Exercise 11.10). Repeat the 
example using profits as the regressor. A priori, would you expect your results to be 
different from those using sales as the regressor? Why or why not? 

11.20. Table 11.8 gives data on median salaries of full professors in statistics in research 
universities in the United States for the academic year 2007. 

a. Plot median salaries against years in rank (as a measure of years of experience). 
For the plotting purposes, assume that the median salaries refer to the midpoint 
of years in rank. Thus, the salary $124,578 in the range 4-5 refers to 4.5 years in 
the rank, and so on. For the last group, assume that the range is 31-33. 

b. Consider the following regression models: 

Y i =ai+a 2 X i +u i (1) 

ft - h + PiXi + fcXf + Vi (2) 


TABLE 11.8 

Median Salaries of 
Full Professors in 
Statistics, 2007 

Association, “2007 Salary 
Report.” 


Years in Rank Count Median 


0 to 1 40 

2 to 3 24 

4 to 5 35 

6 to 7 34 

8 to 9 33 

10 to 14 73 

15 to 19 69 

20 to 24 54 

25 to 30 44 

31 or more 25 


$101,478 
102,400 
124,578 
122,850 

116.900 
119,465 

114.900 
129,072 
131,704 
143.000 
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where Y = median salary, X = years in rank (measured at midpoint of the 
range), and u and v are the error terms. Can you argue why model (2) might be 
preferable to model (1)? From the data given, estimate hoth the models. 

c. If you observe heteroscedasticity in model (1) but not in model (2), what con¬ 
clusion would you draw? Show the necessary computations. 

d. If heteroscedasticity is observed in model (2), how would you transform the data 
so that in the transformed model there is no heteroscedasticity? 

11.21. You are given the following data: 

RSSi based on the first 30 observations = 55, df = 25 
RSS 2 based on the last 30 observations = 140, df = 25 

Carry out the Goldfeld-Quandt test of heteroscedasticity at the 5 percent level of 

significance. 

11.22. Table 11.9 gives data on percent change per year for stock prices ( Y) and consumer 

prices (X) for a cross section of 20 countries. 

a. Plot the data in a scattergram. 

b. Regress Y on X and examine the residuals from this regression. What do you 
observe? 

c. Since the data for Chile seem atypical (outlier?), repeat the regression in (b), 
dropping the data on Chile. Now examine the residuals from this regression. 
What do you observe? 

d. If on the basis of the results in ( b ) you conclude that there was heteroscedastic¬ 
ity in error variance but on the basis of the results in (c) you reverse your con¬ 
clusion, what general conclusions do you draw? 


TABLE 11.9 

Stock and Consumer 
Prices, Post-World 
War II Period 
(through 1969) 

Source: Phillip Cagan, Common 
Stock Values and Inflation: The 
Historical Record of Many 
Countries, National Bureau of 
Economic Research, Suppl., 
March 1974,Table l,p. 4. 


Country 

1. Australia 

2. Austria 

3. Belgium 

4. Canada 

5. Chile 

6. Denmark 

7. Finland 

8. France 

9. Germany 

10. India 

11. Ireland 

12. Israel 

13. Italy 

14. Japan 

15. Mexico 

16. Netherlands 

17. New Zealand 

18. Sweden 

19. United Kingdom 

20. United States 


Rate of Change, % per Year 
Stock Prices, Consumer Prices, 


Y 

5.0 

11.1 

3.2 

7.9 

25.5 

3.8 
11.1 

9.9 
13.3 

1.5 

6.4 

8.9 
8.1 

13.5 
4.7 

7.5 
4.7 
8.0 
7.5 
9.0 


X 

4.3 

4.6 

2.4 

2.4 
26.4 

4.2 

5.5 

4.7 

2.2 
4.0 
4.0 
8.4 
3.3 
4.7 
5.2 

3.6 
3.6 
4.0 
3.9 
2.1 
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11.23. Table 11.10 from the website gives salary and related data on 447 executives of 
Fortune 500 companies. Data include salary = 1999 salary and bonuses; totcomp = 
1999 CEO total compensation; tenure = number of years as CEO (0 if less than 
6 months); age = age of CEO; sales = total 1998 sales revenue of the firm; profits = 
1998 profits for the firm; and assets = total assets of the firm in 1998. 

a. Estimate the following regression from these data and obtain the Breusch- 
Pagan-Godfrey statistic to check for heteroscedasticity: 

salary, = fi\ + /^tenure, + ftage, + /^sales, + /^profits, + /^assets,- + m, 
Does there seem to be a problem with heteroscedasticity? 

b. Now create a second model using ln(Salary) as the dependent variable. Is there 
any improvement in the heteroscedasticity? 

c. Create scattergrams of salary vs. each of the independent variables. Can you dis¬ 
cern which variable(s) is (are) contributing to the issue? What suggestions would 
you make now to address this? What is your final model? 


Appendix 11A 


11A.1 Proof of Equation (11.2.2) 




From Appendix 3A, Section 3A.3, we have 


tr (&) = *(*?«?+ *§«f- 

= *(*?«?+#4 ■ 


■ + k%u*+2 cross-product termsj 

■■+*„ 2 0 


since the expectations of the cross-product terms are zero because of the assumption of no serial 
correlation, 


var(ft) = *?*(«?) + •• • + *ft(« 2 ) 

since the ki are known. (Why?) 


var (f 2 ) = if of + k\csl + ■ ■ ■ + of 


«r(A) = I>,V 




" x >, 2 


jj>, 2 f§ 

M 2 


11 A.2 The Method of Weighted Least Squares 


To illustrate the method, we use the two-variable model Y, = + fcX t + m, . The unweighted least- 

squares method minimizes 


£*? = - Pi - hxtf 


(i) 
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to obtain the estimates, whereas the weighted least-squares method minimizes the weighted residual 
sum of squares: 

E YH&i = E Wi{Yi ~ P? ~ ft Xi)2 ( 2 ) 

where ft* and /3| are the weighted least-squares estimators and where the weights w, are such that 


(3) 


that is, the weights are inversely proportional to the variance of w, or Y t conditional upon the given X l , 
it being understood that var («,- | Xf) = var ( Y t \Xf) = of. 

Differentiating Eq. (2) with respect to ft* and /3|, we obtain 


8 ^T' = 2 e w ‘< Y i - ft - to(-i) 

' = 2 e -p*- Psm-Xi) 


Setting the preceding expressions equal to zero, we obtain the following two normal equations: 

Ew^ = #E w ‘+&E w ^ (4) 

E w ‘ x < Y ‘ = ftY, w ‘ x + ftJl w ‘ x i (5) 


Notice the similarity between these normal equations and the normal equations of the unweighted 
least squares. 

Solving these equations simultaneously, we obtain 

ft =?*- fax* (6) 




(11.3.8) = (7) 


The variance of ft shown in Eq. (11.3.9) can be obtained in the manner of the variance of ft shown 
in Appendix 3A, Section 3A.3. 

Note: Y* = w; K,/ Jf Wi and X* = vv,-. As can be readily verified, these weighted 

means coincide with the usual or unweighted means Y and X when w, = vr, a constant, for all i. 


1 1A.3 Proof that E(a 2 ) +o 2 in the Presence 
of Heteroscedasticity 


Consider the two-variable model: 


where var (uf) = • 
Now 


M=lh+lhXi+Ui 


( 1 ) 


or, - f,f _ TXP i + frXi + «■ - Pi - PiXjf 

n-2 n — 2 n-2 

= T,[-(.Pl-Pl)-(.Pl-P2)Xi+Ui] 2 

n-2 


(2) 
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Noting that (f\ — fi\ ) = —( fij — Pi)X + u, and substituting this into Eq. (2) and taking expecta¬ 
tions on both sides, we get: 

E(a 2 ) = -±- 2 {-J>? var(/3 2 ) + E - uf] ] 


where use is made of Eq. (11.2.2). 

As you can see from Eq. (3), if there is homoscedasticity, that is, af — a 2 for each i, E(d 2 ) = a 2 . 
Therefore, the expected value of the conventionally computed a 1 = Yf u 2 /(n — 2) will not be equal 
to the true tr 2 in the presence of heteroscedasticity. 1 

11A.4 White's Robust Standard Errors 


To give you some idea about White’s heteroscedasticity-corrected standard errors, consider the two- 
variable regression model: 


Yi =fh + p 2 X i + u i var(tti) = af 

(1) 

As shown in Eq. (11.2.2), 


y> 2 o- 2 

v»rUW - 

(M) 

(2) 


Since af are not directly observable, White suggests using w?, the squared residual for each i, in place 
of af and estimating the var (Jii) as follows: 



White has shown that Eq. (3) is a consistent estimator of Eq. (2), that is, as the sample size increases 
indefinitely, Eq. (3) converges to Eq. (2). 2 

Incidentally, note that if your software package does not contain White’s robust standard error pro¬ 
cedure, you can do it as shown in Eq. (3) hy first running the usual OLS regression, obtaining the 
residuals from this regression, and then using formula (3). 

White’s procedure can be generalized to the k -variable regression model 

V, -ft + foX n + foX 3i + ■ • • + + u, (4) 

The variance of any partial regression coefficient, say pj, is obtained as follows: 

var {fij) = , 11 ' (5) 

M) 

where u, are the residuals obtained from the (original) regression (4) and wj are the residuals 
obtained from the (auxiliary) regression of the regressor Xj on the remaining regressors in Eq. (4). 

Obviously, this is a time-consuming procedure, for you will have to estimate Eq. (5) for each X 
variable. Of course, all this labor can be avoided if you have a statistical package that does this rou¬ 
tinely. Packages such as PC-GIVE, EViews, MICROFIT, SHAZAM, STATA, and LIMDEP now 
obtain White’s heteroscedasticity-robust standard errors very easily. 

'Further details can be obtained from Jan Kmenta, Elements of Econometrics, 2d. ed., Macmillan, New 
York, 1986, pp. 276-278. 

2 To be more precise, n times Eq. (3) converges in probability to E [(X, — vx) 2 uf]/(a^) 2 , which is the 
probability limit of n times Eq. (2), where n is the sample size, p x is the expected value of X, and a\ is 
the (population) variance of X. For more details, see Jeffrey M. Wooldridge, Introductory Econometrics: 
A Modern Approach, South-Western Publishing, 2000, p. 250. 






Chapter 


Autocorrelation: What 
Happens If the Error 
Terms Are Correlated? 


The reader may recall that there are generally three types of data that are available for 
empirical analysis: (1) cross section, (2) time series, and (3) combination of cross sec¬ 
tion and time series, also known as pooled data. In developing the classical linear regres¬ 
sion model (CLRM) in Part 1 we made several assumptions, which were discussed 
in Section 7.1. However, we noted that not all of these assumptions would hold in 
every type of data. As a matter of fact, we saw in the previous chapter that the assumption 
of homoscedasticity, or equal error variance, may not always be tenable in cross- 
sectional data. In other words, cross-sectional data are often plagued by the problem of 
heteroscedasticity. 

However, in cross-section studies, data are often collected on the basis of a random 
sample of cross-sectional units, such as households (in a consumption function analysis) or 
firms (in an investment study analysis) so that there is no prior reason to believe that the 
error term pertaining to one household or firm is correlated with the error term of another 
household or firm. If by chance such a correlation is observed in cross-sectional units, it is 
called spatial autocorrelation, that is, correlation in space rather than over time. However, 
it is important to remember that, in cross-sectional analysis, the ordering of the data must 
have some logic, or economic interest, to make sense of any determination of whether 
(spatial) autocorrelation is present or not. 

The situation, however, is likely to be very different if we are dealing with time series 
data, for the observations in such data follow a natural ordering over time so that successive 
observations are likely to exhibit intercorrelations, especially if the time interval between 
successive observations is short, such as a day, a week, or a month rather than a year. If you 
observe stock price indexes, such as the Dow Jones or S&P 500, over successive days, it is 
not unusual to find that these indexes move up or down for several days in succession. 
Obviously, in situations like this, the assumption of no auto-, or serial, correlation in the 
error terms that underlies the CLRM will be violated. 

In this chapter we take a critical look at this assumption with a view to answering the 
following questions: 


412 


1. What is the nature of autocorrelation? 

2. What are the theoretical and practical consequences of autocorrelation? 
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3. Since the assumption of no autocorrelation relates to the unobservable disturbances u t , 
how does one know that there is autocorrelation in any given situation? Notice that we 
now use the subscript t to emphasize that we are dealing with time series data. 

4. How does one remedy the problem of autocorrelation? 

The reader will find this chapter in many ways similar to the preceding chapter on het- 
eroscedasticity in that under both heteroscedasticity and autocorrelation the usual 
OLS estimators, although linear, unbiased, and asymptotically (i.e., in large samples) 
normally distributed, 1 are no longer minimum variance among all linear unbiased 
estimators. In short, they are not efficient relative to other linear and unbiased 
estimators. Put differently, they may not be best linear unbiased estimators (BLUE). 
As a result, the usual, t, F, and x 2 may not be valid. 

12.1 The Nature of the Problem 


The term autocorrelation may be defined as “correlation between members of series of 
observations ordered in time [as in time series data] or space [as in cross-sectional data].” 2 
In the regression context, the classical linear regression model assumes that such autocor¬ 
relation does not exist in the disturbances u, . Symbolically, 

cov(w,-, Uj\ Xi , Xj ) = E{u iUj ) = 0 i?j (3.2.5) 

Put simply, the classical model assumes that the disturbance term relating to any observa¬ 
tion is not influenced by the disturbance term relating to any other observation. For exam¬ 
ple, if we are dealing with quarterly time series data involving the regression of output on 
labor and capital inputs and if, say, there is a labor strike affecting output in one quarter, 
there is no reason to believe that this disruption will be carried over to the next quarter. That 
is, if output is lower this quarter, there is no reason to expect it to be lower next quarter. 
Similarly, if we are dealing with cross-sectional data involving the regression of family 
consumption expenditure on family income, the effect of an increase of one family’s income 
on its consumption expenditure is not expected to affect the consumption expenditure of 
another family. 

However, if there is such a dependence, we have autocorrelation. Symbolically, 
EiuiUj)^ 0 t?j ( 12 . 1 . 1 ) 

In this situation, the disruption caused by a strike this quarter may very well affect output 
next quarter, or the increases in the consumption expenditure of one family may very well 
prompt another family to increase its consumption expenditure if it wants to keep up with 
the Joneses. 

Before we find out why autocorrelation exists, it is essential to clear up some termino¬ 
logical questions. Although it is now a common practice to treat the terms autocorrelation 
and serial correlation synonymously, some authors prefer to distinguish the two terms. For 
example, Tintner defines autocorrelation as “lag correlation of a given series with itself, 
lagged by a number of time units,” whereas he reserves the term serial correlation to define 


'On this, see William H. Greene, Econometric Analysis, 4th ed., Prentice Hall, NJ, 2000, Chapter 11, 
and Paul A. Rudd, An Introduction to Classical Econometric Theory, Oxford University Press, 2000, 
Chapter 19. 

2 Maurice C. Kendall and William R. Buckland, A Dictionary of Statistical Terms, Hafner Publishing 
Company, New York, 1971, p. 8. 
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“lag correlation between two different series.” 3 Thus, correlation between two time series 
such as u\, u 2 , ..., u\o and 112, u 3 ,..., u\\, where the former is the latter series lagged by 
one time period, is autocorrelation, whereas correlation between time series such as 
Mi, M2,..., u 10 and V2, V3,..., vn, where u and v are two different time series, is called 
serial correlation. Although the distinction between the two terms may be useful, in this 
book we shall treat them synonymously. 

Let us visualize some of the plausible patterns of auto- and nonautocorrelation, which are 
given in Figure 12.1. Figures 12. la to d show that there is a discernible pattern among the ids. 
Figure 12.1a shows a cyclical pattern; Figures 12.1b and c suggest an upward or downward 
linear trend in the disturbances; whereas Figure 12.1 d indicates that both linear and quadratic 
trend terms are present in the disturbances. Only Figure 12.le indicates no systematic pat¬ 
tern, supporting the nonautocorrelation assumption of the classical linear regression model. 

The natural question is: Why does serial correlation occur? There are several reasons, 
some of which are as follows: 

Inertia 

A salient feature of most economic time series is inertia, or sluggishness. As is well known, 
time series such as GNP, price indexes, production, employment, and unemployment exhibit 
(business) cycles. Starting at the bottom of the recession, when economic recovery starts, 
most of these series start moving upward. In this upswing, the value of a series at one point 
in time is greater than its previous value. Thus there is a “momentum” built into them, and 
it continues until something happens (e.g., increase in interest rate or taxes or both) to slow 
them down. Therefore, in regressions involving time series data, successive observations are 
likely to be interdependent. 

Specification Bias: Excluded Variables Case 

In empirical analysis the researcher often starts with a plausible regression model that may 
not be the most “perfect” one. After the regression analysis, the researcher does the post¬ 
mortem to find out whether the results accord with a priori expectations. If not, surgery is 
begun. For example, the researcher may plot the residuals u, obtained from the fitted re¬ 
gression and may observe patterns such as those shown in Figure 12.1a to d. These residu¬ 
als (which are proxies for a,) may suggest that some variables that were originally 
candidates but were not included in the model for a variety of reasons should be included. 
This is the case of excluded variable specification bias. Often the inclusion of such vari¬ 
ables removes the correlation pattern observed among the residuals. For example, suppose 
we have the following demand model: 

Y,=p 1 + p 2 X 2t + p 3 X 3t + p^X M + u t (12.1.2) 

where Y — quantity of beef demanded, X 2 = price of beef, X 3 = consumer income, X 4 = 
price of pork, and t — time. 4 However, for some reason we run the following regression: 

Y t = Pi + PiX 2t + p 3 X 3 , + v ( (12.1.3) 

Now if Eq. (12.1.2) is the “correct” model or the “truth” or true relation, running 
Eq. (12.1.3) is tantamount to letting v, = ^ 4 X \ t + u t . And to the extent the price of pork 
affects the consumption of beef, the error or disturbance term v will reflect a systematic 


3 Gerhard Tintner, Econometrics, John Wiley & Sons, New York, 1965. 

4 As a matter of convention, we shall use the subscript f to denote time series data and the usual sub¬ 
script /for cross-sectional data. 
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FIGURE 12.1 u,u u,u 
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pattern, thus creating (false) autocorrelation. A simple test of this would be to run both 
Eqs. (12.1.2) and (12.1.3) and see whether autocorrelation, if any, observed in model (12.1.3) 
disappears when model (12.1.2) is run. 5 The actual mechanics of detecting autocorrelation 
will be discussed in Section 12.6 where we will show that a plot of the residuals from 
regressions (12.1.2) and (12.1.3) will often shed considerable light on serial correlation. 


5 lf it is found that the real problem is one of specification bias, not autocorrelation, then as will be 
shown in Chapter 1 3, the OLS estimators of the parameters in Eq. (12.1.3) may be biased as well as 
inconsistent. 
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FIGURE 12.2 

Specification bias: 
incorrect functional 


u Output 

Specification Bias: Incorrect Functional Form 

Suppose the “true” or correct model in a cost-output study is as follows: 

Marginal cost, = fit + f} 2 output, + jS 3 output 2 + u, (12.1.4) 
but we fit the following model: 

Marginal cost, = a\ + a 2 output, + v, (12.1.5) 

The marginal cost curve corresponding to the “true” model is shown in Figure 12.2 along 
with the “incorrect” linear cost curve. 

As Figure 12.2 shows, between points A and B the linear marginal cost curve will con¬ 
sistently overestimate the true marginal cost, whereas beyond these points it will consis¬ 
tently underestimate the true marginal cost. This result is to be expected, because the 
disturbance term v, is, in fact, equal to output 2 + and hence will catch the systematic 
effect of the output 2 term on marginal cost. In this case, v, will reflect autocorrelation 
because of the use of an incorrect functional form. In Chapter 13 we will consider several 
methods of detecting specification bias. 

Cobweb Phenomenon 

The supply of many agricultural commodities reflects the so-called cobweb phenomenon, 
where supply reacts to price with a lag of one time period because supply decisions 
take time to implement (the gestation period). Thus, at the beginning of this year’s planting 
of crops, farmers are influenced by the price prevailing last year, so that their supply 
function is 



Supply, = Pi + P 2 P,-i + u, (12.1.6) 

Suppose at the end of period t, price P, turns out to be lower than P,_i. Therefore, in period 
t + 1 farmers may very well decide to produce less than they did in period t. Obviously, in 
this situation the disturbances u, are not expected to be random because if the farmers over¬ 
produce in year t, they are likely to reduce their production in t + 1 , and so on, leading to 
a cobweb pattern. 

Lags 

In a time series regression of consumption expenditure on income, it is not uncommon to 
find that the consumption expenditure in the current period depends, among other things, 
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on the consumption expenditure of the previous period. That is, 

Consumption, = P\ + f 2 income; + P 3 consumption, _j + u, (12.1.7) 

A regression such as Eq. (12.1.7) is known as autoregression because one of the explana¬ 
tory variables is the lagged value of the dependent variable. (We shall study such models in 
Chapter 17.) The rationale for a model such as Eq. (12.1.7) is simple. Consumers do not 
change their consumption habits readily for psychological, technological, or institutional 
reasons. Now if we neglect the lagged term in Eq. (12.1.7), the resulting error term will 
reflect a systematic pattern due to the influence of lagged consumption on current 
consumption. 

“Manipulation ” ofData 

In empirical analysis, the raw data are often “manipulated.” For example, in time series re¬ 
gressions involving quarterly data, such data are usually derived from the monthly data 
by simply adding three monthly observations and dividing the sum by 3. This averaging 
introduces smoothness into the data by dampening the fluctuations in the monthly data. 
Therefore, the graph plotting the quarterly data looks much smoother than the monthly 
data, and this smoothness may itself lend to a systematic pattern in the disturbances, 
thereby introducing autocorrelation. Another source of manipulation is interpolation or 
extrapolation of data. For example, the Census of Population is conducted every 10 years 
in this country, the last being in 2000 and the one before that in 1990. Now if there is a 
need to obtain data for some year within the intercensus period 1990-2000, the common 
practice is to interpolate on the basis of some ad hoc assumptions. All such data “massag¬ 
ing” techniques might impose upon the data a systematic pattern that might not exist in 
the original data. 6 

Data Transformation 

As an example of this, consider the following model: 

Y, = p l + p 2 X, + u, (12.1.8) 

where, say, Y = consumption expenditure and X — income. Since Eq. (12.1.8) holds true 
at every time period, it holds true also in the previous time period, (t — 1). So, we can write 
Eq. (12.1.8) as 

Y,-\ = Pi+ p 2 X t -i + u t -i (12.1.9) 

1, X t ~ 1, and m,_i are known as the lagged values of Y, X, and u, respectively, here 
lagged by one period. We will see the importance of the lagged values later in the chapter 
as well in several places in the text. 

Now if we subtract Eq. (12.1.9) from Eq. (12.1.8), we obtain 

AY, = p 2 AX t + Au t (12.1.10) 

where A, known as the first difference operator, tells us to take successive differences 
of the variables in question. Thus, AY, = (Y, — Y,_ 1), AX, — ( X , — X t _\ ), and Am, = 
(u, — u,- 1). For empirical purposes, we write Eq. (12.1.10) as 

AY, = p 2 AX, + v t (12.1.11) 

where v, = Am, = (m, - m,_ 1). 


6 On this, see William H. Greene, op. cit., p. 526. 
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Equation (12.1.9) is known as the level form and Eq. (12.1.10) is known as the (first) 
difference form. Both forms are often used in empirical analysis. For example, if in 
Eq. (12.1.9) Y and X represent the logarithms of consumption expenditure and income, then 
in Eq. (12.1.10) AY and AX will represent changes in the logs of consumption expendi¬ 
ture and income. But as we know, a change in the log of a variable is a relative change, or a 
percentage change, if the former is multiplied by 100. So, instead of studying relationships 
between variables in the level form, we may be interested in their relationships in the 
growth form. 

Now if the error term in Eq. (12.1.8) satisfies the standard OLS assumptions, particu¬ 
larly the assumption of no autocorrelation, it can be shown that the error term v, in 
Eq. (12.1.11) is autocorrelated. (The proof is given in Appendix 12A, Section 12A.1.) It 
may be noted here that models like Eq. (12.1.11) are known as dynamic regression 
models, that is, models involving lagged regressands. We will study such models in depth 
in Chapter 17. 

The point of the preceding example is that sometimes autocorrelation may be induced 
as a result of transforming the original model. 

Nonstationarity 

We mentioned in Chapter 1 that, while dealing with time series data, we may have to find 
out if a given time series is stationary. Although we will discuss the topic of nonstationary 
time series more thoroughly in the chapters on time series econometrics in Part 5 of the 
text, loosely speaking, a time series is stationary if its characteristics (e.g., mean, variance, 
and covariance) are time invariant; that is, they do not change over time. If that is not the 
case, we have a nonstationary time series. 

As we will discuss in Part 5, in a regression model such as Eq. (12.1.8), it is quite possible 
that both 7 and X are nonstationary and therefore the error u is also nonstationary. * * * * * * 7 In that 
case, the error term will exhibit autocorrelation. 

In summary, then, there are a variety of reasons why the error term in a regression model 
may be autocorrelated. In the rest of the chapter we investigate in some detail the problems 
posed by autocorrelation and what can be done about it. 

It should be noted also that autocorrelation can be positive (Figure 12.3a) as well as 
negative, although most economic time series generally exhibit positive autocorrelation 
because most of them ether move upward or downward over extended time periods and do 
not exhibit a constant up-and-down movement such as that shown in Figure 12.3ft. 

12.2 OLS Estimation in the Presence of Autocorrelation 


What happens to the OLS estimators and their variances if we introduce autocorrelation in 

the disturbances by assuming that E(u t u t+S ) ^ 0 (s 0) but retain all the other assump¬ 

tions of the classical model? 8 Note again that we are now using the subscript t on the dis¬ 
turbances to emphasize that we are dealing with time series data. 

We revert once again to the two-variable regression model to explain the basic ideas 

involved, namely, Y, = + fi 2 X t + u t . To make any headway, we must assume the mech¬ 

anism that generates u t , for E(u t u, +S ) ^ 0 (s ^ 0) is too general an assumption to be of 


7 As we will also see in Part 5, even though Y and X are nonstationary, it is possible to find u to be 
stationary. We will explore the implication of that later on. 

8 lf s= 0, we obtain E (u^). Since E(u t ) = 0 by assumption, E(uf) will represent the variance of the 
error term, which obviously is nonzero (why?). 
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FIGURE 12.3 

(a) Positive and 

(b) negative 
autocorrelation. 



/ • ‘ * / 


( a ) 
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any practical use. As a starting point, or first approximation, one can assume that the dis¬ 
turbance, or error, terms are generated by the following mechanism. 

U t = pUt—l + £f — 1 < >0 < 1 (12.2.1) 

where p ( = rho) is known as the coefficient of autocovariance and where s t is the sto¬ 
chastic disturbance term such that it satisfies the standard OLS assumptions, namely, 

E(e t ) = 0 

var (e ( ) = (12.2.2) 

cov(e ; , e, +s ) = 0 S=£0 

In the engineering literature, an error term with the preceding properties is often called 
a white noise error term. What Eq. (12.2.1) postulates is that the value of the disturbance 
term in period t is equal to p times its value in the previous period plus a purely random 
error term. 

The scheme (12.2.1) is known as a Markov first-order autoregressive scheme, or sim¬ 
ply a first-order autoregressive scheme, usually denoted as AR(1). The name autoregres¬ 
sive is appropriate because Eq. (12.2.1) can be interpreted as the regression of u t on itself 
lagged one period. It is first order because u, and its immediate past value are involved; that 
is, the maximum lag is 1. If the model were u, = ppu t -\ + piu t -2 + s t , it would be an 
AR(2), or second-order, autoregressive scheme, and so on. We will examine such higher- 
order schemes in the chapters on time series econometrics in Part 5. 
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In passing, note that p, the coefficient of autocovariance in Eq. (12.2.1), can also be 
interpreted as the first-order coefficient of autocorrelation, or more accurately, the 
coefficient of autocorrelation at lag l. 9 

Given the AR(1) scheme, it can be shown that (see Appendix 12A, Section 12A.2): 


var( M ,) = £K) = y^ 

(12.2.3) 

a 2 

cov (u t , U t+ s) = E{u t u t -s) = p s -— s -^ 

(12.2.4) 

1 - p 2 


cor (u t , Ut+s) = P s 

(12.2.5) 


where cov (u t , u t+s ) means covariance between error terms 5 periods apart and where 
cor(iq, u t + s ) means correlation between error terms s periods apart. Note that because of 
the symmetry property of covariances and correlations, cov(u t , u,+ s ) — cov(u t , u,- s ) and 
cor(M(, u t +s) = cor (u t , u t ~ s ). 

Since p is a constant between —1 and +1, Eq. (12.2.3) shows that under the AR(1) 
scheme, the variance of u t is still homoscedastic, but u, is correlated not only with its im¬ 
mediate past value but its values several periods in the past. It is critical to note that 
\p\ < 1, that is, the absolute value of p is less than 1. If, for example, p is 1, the variances 
and covariances listed above are not defined. If \p\ < 1, we say that the AR(1) process 
given in Eq. (12.2.1) is stationary; that is, the mean, variance, and covariance of u t do not 
change over time. If | p | is less than 1, then it is clear from Eq. (12.2.4) that the value of the 
covariance will decline as we go into the distant past. We will see the utility of the preced¬ 
ing results shortly. 

One reason we use the AR(1) process is not only because of its simplicity compared to 
higher-order AR schemes, but also because in many applications it has proved to be quite 
useful. Additionally, a considerable amount of theoretical and empirical work has been 
done on the AR(1) scheme. 

Now return to our two-variable regression model: Y t = f}\ + fcXf + u, . We know from 
Chapter 3 that the OLS estimator of the slope coefficient is 



and its variance is given by 

var0§ 2 )=^ (12.2.7) 

where the small letters as usual denote deviation from the mean values. 

9 This name can be easily justified. By definition, the (population) coefficient of correlation between 
u t and u t _i is 

E[[u,~ E(u t )][u,^ ^ 

_ E(.u t u t - 1) 
var(u t _i) 


since E(u t ) = 0 for each t and var (u t ) = var(u f _i) because we are retaining the assumption of 
homoscedasticity. The reader can see that p is also the slope coefficient in the regression of u t on u t - 1. 
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Now under the AR(1) scheme, it can be shown that the variance of this estimator is: 



( 12 . 2 . 8 ) 


where var (/1 2 )ari means the variance of /J 2 under a first-order autoregressive scheme. 

A comparison of Eq. (12.2.8) with Eq. (12.2.7) shows the former is equal to the latter 
times a term that depends on p as well as the sample autocorrelations between the values 
taken by the regressor X at various lags. 10 And in general we cannot foretell whether 
var (/b) is less than or greater than var (/S 2 )ari (but see Eq. [12.4.1] below). Of course, if p 
is zero, the two formulas will coincide, as they should (why?). Also, if the correlations 
among the successive values of the regressor are very small, the usual OLS variance of the 
slope estimator will not be seriously biased. But, as a general principle, the two variances 
will not be the same. 

To give some idea about the difference between the variances given in Eqs. (12.2.7) and 
(12.2.8), assume that the regressor X also follows the first-order autoregressive scheme 
with a coefficient of autocorrelation of r. Then it can be shown that Eq. (12.2.8) reduces to: 



(12.2.9) 


If, for example, r — 0.6 and p — 0.8, using Eq. (12.2.9) we can check that var (^2 )ari = 
2.8461 var(/3 2 )oLS- To put it another way, var(/§ 2 )oLS = j^-var (/} 2 )ari = 0.3513 
var (/6 2 )ari- That is, the usual OLS formula (i.e., Eq. [12.2.7]) will underestimate the vari¬ 
ance of (/S 2 )ari by about 65 percent. As you will realize, this answer is specific for the 
given values of r and p. But the point of this exercise is to warn you that a blind application 
of the usual OLS formulas to compute the variances and standard errors of the OLS 
estimators could give seriously misleading results. 

Suppose we continue to use the OLS estimator /l 2 and adjust the usual variance for¬ 
mula by taking into account the AR(1) scheme. That is, we use jS 2 given by Eq. (12.2.6) 
but use the variance formula given by Eq. (12.2.8). What now are the properties of yS 2 ? It 
is easy to prove that /L is still linear and unbiased. As a matter of fact, as shown in Ap¬ 
pendix 3A, Section 3A.2, the assumption of no serial correlation, like the assumption of 
no heteroscedasticity, is not required to prove that /S 2 is unbiased. Is /l 2 still BLUE? Un¬ 
fortunately, it is not; in the class of linear unbiased estimators, it does not have minimum 
variance. In short, /) 2 , although linear-unbiased, is not efficient (relatively speaking, of 
course). The reader will notice that this finding is quite similar to the finding that jS 2 is 
less efficient in the presence of heteroscedasticity. There we saw that it was the weighted 
least-square estimator given in Eq. (11.3.8), a special case of the generalized least- 
squares (GLS) estimator, that was efficient. In the case of autocorrelation can we find an 
estimator that is BLUE? The answer is yes, as can be seen from the discussion in the 
following section. 


10 Note that the term r = £ x t xt+ 1 /Y* x t is the correlation between X t and X t+ i (or X t _i, since the 
correlation coefficient is symmetric); r 2 = £ XfXt+2/ J2 x t ' s the correlation between the X's lagged 
two periods; and so on. 
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12.3 The BLUE Estimator in the Presence of Autocorrelation 

Continuing with the two-variable model and assuming the AR(1) process, we can show that 
the BLUE estimator of fa. is given by the following expression: 11 


- 3y-iB - m~ i) 

- PXt-i) 2 


(12.3.1) 


where C is a correction factor that may be disregarded in practice. Note that the subscript t 
now runs from t — 2 to t = n. And its variance is given by 


YX=2( X I - PXt-l) 2 


(12.3.2) 


where D too is a correction factor that may also be disregarded in practice. (See Exer¬ 
cise 12.18.) 

The estimator ydf LS , as the superscript suggests, is obtained by the method of GLS. As 
noted in Chapter 11, in GLS we incorporate any additional information we have (e.g., the 
nature of the heteroscedasticity or of the autocorrelation) directly into the estimating pro¬ 
cedure by transforming the variables, whereas in OLS such side information is not directly 
taken into consideration. As the reader can see, the GLS estimator of @2 given in 
Eq. (12.3.1) incorporates the autocorrelation parameter p in the estimating formula, 
whereas the OLS formula given in Eq. (12.2.6) simply neglects it. Intuitively, this is the rea¬ 
son why the GLS estimator is BLUE and not the OLS estimator—the GLS estimator makes 
the most use of the available information. 12 It hardly needs to be added that if p = 0, there 
is no additional information to be considered and hence both the GLS and OLS estimators 
are identical. 

In short, under autocorrelation, it is the GLS estimator given in Eq. (12.3.1) that is 
BLUE, and the minimum variance is now given by Eq. (12.3.2) and not by Eq. (12.2.8) and 
obviously not by Eq. (12.2.7). 

A Technical Note 

As we noted in the previous chapter, the Gauss-Markov theorem provides only the suffi¬ 
cient condition for OLS to be BLUE. The necessary and sufficient conditions for OLS to be 
BLUE are given by Kruskal’s theorem, mentioned in the previous chapter. Therefore, in 
some cases it can happen that OLS is BLUE despite autocorrelation. But such cases are 
infrequent in practice. 

What happens if we blithely continue to work with the usual OLS procedure despite 
autocorrelation? The answer is provided in the following section. 


"For proofs, see Jan Kmenta, Elements of Econometrics, Macmillan, New York, 1971, pp. 274-275. 
The correction factor C pertains to the first observation, (Vi, Xi). On this point see Exercise 12.18. 
12 The formal proof that /§2 LS ' s BLUE can be found in Kmenta, ibid. But the tedious algebraic proof 
can be simplified considerably using matrix notation. See J. Johnston, Econometric Methods, 3d ed., 
McGraw-Hill, New York, 1984, pp. 291-293. 
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12.4 Consequences of Using OLS in the Presence 
of Autocorrelation 


As in the case of heteroscedasticity, in the presence of autocorrelation the OLS estimators 
are still linear unbiased as well as consistent and asymptotically normally distributed, but 
they are no longer efficient (i.e., minimum variance). What then happens to our usual hy¬ 
pothesis testing procedures if we continue to use the OLS estimators? Again, as in the case 
of heteroscedasticity, we distinguish two cases. For pedagogical purposes we still continue 
to work with the two-variable model, although the following discussion can be extended to 
multiple regressions without much trouble. 13 

OLS Estimation Allowing for Autocorrelation 

As noted, f 2 is not BLUE, and even if we use var (/LIari , the confidence intervals derived 
from there are likely to be wider than those based on the GLS procedure. As Kmenta 
shows, this result is likely to be the case even if the sample size increases indefinitely. 14 
That is, is not asymptotically efficient. The implication of this finding for hypothesis test¬ 
ing is clear: We are likely to declare a coefficient statistically insignificant (i.e., not differ¬ 
ent from zero) even though in fact (i.e., based on the correct GLS procedure) it may be. 
This difference can be seen clearly from Figure 12.4. In this figure we show the 95% OLS 
[AR(l)] and GLS confidence intervals assuming that true = 0. Consider a particular 
estimate of say, b 2 . Since b 2 lies in the OLS confidence interval, we could accept the 
hypothesis that true ffi is zero with 95 percent confidence. But if we were to use the (cor¬ 
rect) GLS confidence interval, we could reject the null hypothesis that true /L is zero, for 
bj lies in the region of rejection. 

The message is: To establish confidence intervals and to test hypotheses, one should 
use GLS and not OLS even though the estimators derived from the latter are unbiased 
and consistent. (However, see Section 12.11 later.) 


FIGURE 12.4 

GLS and OLS 95% 
confidence intervals. 



GLS 95% interval 


OLS 95% interval 


OLS Estimation Disregarding Autocorrelation 

The situation is potentially very serious if we not only use but also continue to use 
var (ffi) = a 2 1 J2 xf , which completely disregards the problem of autocorrelation, that is, 
we mistakenly believe that the usual assumptions of the classical model hold true. Errors 
will arise for the following reasons: 

1. The residual variance a 2 = ^u 2 /(n — 2) is likely to underestimate the true o 2 . 

2. As a result, we are likely to overestimate R 2 . 


13 But matrix algebra becomes almost a necessity to avoid tedious algebraic manipulations. 
14 See Kmenta, op. cit., pp. 277-278. 
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3. Even if a 2 is not underestimated, var (/l 2 ) may underestimate var (^)ari (Eq. [12.2.8]), 
its variance under (first-order) autocorrelation, even though the latter is inefficient com¬ 
pared to var(/J 2 ) GLS - 

4. Therefore, the usual t and F tests of significance are no longer valid, and if applied, are 
likely to give seriously misleading conclusions about the statistical significance of the 
estimated regression coefficients. 

To establish some of these propositions, let us revert to the two-variable model. We 
know from Chapter 3 that under the classical assumption 



provides an unbiased estimator of <x 2 , that is, E{or 2 ) — a 2 . But if there is autocorrelation, 
given by AR(1), it can be shown that 


E{p 2 ) = 


& 2 {n — [2/(1 — p)] — 2 pr} 
n-2 


(12.4.1) 


where r = YZtZ 1 x t x t-\/ Y?t= 1 x t> which can be interpreted as the (sample) correlation 
coefficient between successive values of the A’s. 15 If p and r are both positive (not an 
unlikely assumption for most economic time series), it is apparent from Eq. (12.4.1) that 
E(a 2 ) < <t 2 ; that is, the usual residual variance formula, on average, will underestimate 
the true a 2 . In other words, a 2 will be biased downward. Needless to say, this bias in a 2 
will be transmitted to var (/J 2 ) because in practice we estimate the latter by the formula 

But even if cs 1 is not underestimated, var (fi 2 ) is a biased estimator of var (^ 2 )ari , which 
can be readily seen by comparing Eq. (12.2.7) with Eq. (12.2.8), 16 for the two formulas are 
not the same. As a matter of fact, if p is positive (which is true of most economic time 
series) and the X’s are positively correlated (also true of most economic time series), then 
it is clear that 


var(/3 2 ) < var (^)ari (12.4.2) 

that is, the usual OLS variance of f 2 underestimates its variance under AR(1) (see 
Eq. [12.2.9]). Therefore, if we use var (/J 2 ), we shall inflate the precision or accuracy (i.e., 
underestimate the standard error) of the estimator /1 2 . As a result, in computing the t ratio 
as t — yS 2 /se (p 2 ) (under the hypothesis that fi 2 = 0), we shall be overestimating the t value 
and hence the statistical significance of the estimated f> 2 - The situation is likely to get worse 
if additionally a 2 is underestimated, as noted previously. 

To see how OLS is likely to underestimate a 2 and the variance of P 2 , let us conduct the 
following Monte Carlo experiment. Suppose in the two-variable model we “know” that 
the true — 1 and fi 2 — 0.8. Therefore, the stochastic PRF is 


Y t 35 1.0 + 0.8X, + u, (12.4.3) 


ls See S. M. Goldfeld and R. E. Quandt, Nonlinear Methods in Econometrics, North Holland Publishing 
Company, Amsterdam, 1972, p. 183. In passing, note that if the errors are positively autocorrelated, 
the R 2 value tends to have an upward bias, that is, it tends to be larger than the R 2 in the absence of 
such correlation. 

16 For a formal proof, see Kmenta, op. cit., p. 281. 
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TABLE 12.1 



- 0 7 + 

A Hypothetical 


s t 

U t ~ ' Uf_1 St 

Example of Positively 

0 

0 

uq = 5 (assumed) 

Autocorrelated Error 

1 

0.464 

ui = 0.7(5) + 0.464 = 3.964 

Terms 

2 

2.026 

u 2 = 0.7(3.964) + 2.0262 = 4.8008 


3 

2.455 

u 3 = 0.7(4.8010) + 2.455 = 5.8157 


4 

-0.323 

u 4 = 0.7(5.8157) - 0.323 = 3.7480 


5 

-0.068 

u 5 = 0.7(3.7480) - 0.068 = 2.5556 


6 

0.296 

u 6 = 0.7(2.5556) + 0.296 = 2.0849 


7 

-0.288 

u 7 = 0.7(2.0849) - 0.288 = 1.1714 


8 

1.298 

u 8 = 0.7(1.1714) + 1 .298 = 2.1180 


9 

0.241 

u 9 = 0.7(2.1180) + 0.241 = 1.7236 


10 

-0.957 

u 10 = 0.7(1.7236) - 0.957 = 0.2495 


Note: S t data obtained from A Million Random Digits and One Hundred Thousand Deviates, Rand 
Corporation, Santa Monica, Calif., 1950. 


Hence, 

E{Y t | X t ) = 1.0 + 0.8X, (12.4.4) 

which gives the true population regression line. Let us assume that u t are generated by the 
first-order autoregressive scheme as follows: 

u t = 0Ju t -! + e, (12.4.5) 

where s, satisfy all the OLS assumptions. We assume further for convenience that the s, are 
normally distributed with zero mean and unit ( = 1) variance. Equation (12.4.5) postulates 
that the successive disturbances are positively correlated, with a coefficient of autocorrela¬ 
tion of +0.7, a rather high degree of dependence. 

Now, using a table of random normal numbers with zero mean and unit variance, we 
generated 10 random numbers shown in Table 12.1 and then by the scheme (12.4.5) we 
generated u,. To start off the scheme, we need to specify the initial value of u, say, uq — 5. 

Plotting the u t generated in Table 12.1, we obtain Figure 12.5, which shows that initially 
each successive u t is higher than its previous value and subsequently it is generally smaller 
than its previous value showing, in general, a positive autocorrelation. 

Now suppose the values of X are fixed at 1,2,3,..., 10. Then, given these X’s, we can 
generate a sample of 10 7 values from Eq. (12.4.3) and the values of u, given in Table 12.1. 
The details are given in Table 12.2. Using the data of Table 12.2, if we regress Y on X, we 
obtain the following (sample) regression: 

Y, = 6.5452 + 0.305 \X, 

(0.6153) (0.0992) 

(12.4.6) 

t = (10.6366) (3.0763) 

r 2 = 0.5419 ct 2 =0.8114 


whereas the true regression line is as given by Eq. (12.4.4). Both the regression lines are 
given in Figure 12.6, which shows clearly how much the fitted regression line distorts the 
true regression line; it seriously underestimates the true slope coefficient but overestimates 
the true intercept. (But note that the OLS estimators are still unbiased.) 

Figure 12.6 also shows why the true variance of u, is likely to be underestimated by the 
estimator a 2 , which is computed from the u t . The u, are generally close to the fitted line 
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FIGURE 12.5 

Correlation generated 
by the scheme 
u t = 0.7m ( _i + e t 
(Table 12.1). 


TABLE 12.2 

Generation of Y 
Sample Values 



u t 

3.9640 

4.8010 

5.8157 

3.7480 

2.5556 

2.0849 

1.1714 

2.1180 

1.7236 

0.2495 


Y t = 1.0 + 0.8X t + u t 


V) = 1.0+ 0.8(1) 
T 2 = 1 -0 + 0.8(2) 
7 3 = 1 -0 + 0.8(3) 
Y 4 = 1.0 + 0.8(4) 
Vs = 1 -0 + 0.8(5) 
Y 6 = 1.0 + 0.8(6) 
Y 7 = 1 .0 + 0.8(7) 
Vs = 1.0 + 0.8(8) 
y 9 = 1.0 + 0.8(9) 

y 10 = i.o + 0 . 8 ( 10 ) 


+ 3.9640 = 5.7640 
+ 4.8008 = 7.4008 
+ 5.8157 = 9.2157 
+ 3.7480 = 7.9480 
+ 2.5556 = 7.5556 
+ 2.0849 = 7.8849 
+ 1.1714 = 7.7714 
+ 2.1180 = 9.5180 
+ 1.7236 = 9.9236 
+ 0.2495 = 9.2495 


Note: u t < 


i obtained from Table 12.1. 


(which is due to the OLS procedure) but deviate substantially from the true PRF. Hence, 
they do not give a correct picture of u,. To gain some insight into the extent of underesti¬ 
mation of true a i 2 , suppose we conduct another sampling experiment. Keeping the X, and 
s t given in Tables 12.1 and 12.2, let us assume p= 0, that is, no autocorrelation. The new 
sample of Y values thus generated is given in Table 12.3. 
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FIGURE 12.6 

True PRF and the 
estimated regression 
line for the data of 
Table 12.2. 


Y 



TABLE 12.3 

Sample of Y Values 
with Zero Serial 
Correlation 


X t e t = u t 

1 0.464 

2 2.026 

3 2.455 

4 -0.323 

5 -0.068 

6 0.296 

7 -0.288 

8 1.298 

9 0.241 

10 -0.957 


Y t = 1.0 + 0.8X t + e t 
2.264 
4.626 
5.855 
3.877 
4.932 
6.096 
6.312 
8.698 
8.441 
8.043 


Note: Since there is no; 
from Table 12.1. 


ntical. The s t are 


The regression based on Table 12.3 is as follows: 

% = 2.5345 + 0.6145A, 

(0.6796) (0.1087) 

t = (3.7910) (5.6541) 

r 2 = 0.7997 a 2 = 0.9752 


(12.4.7) 


This regression is much closer to the “truth” because the F’s are now essentially random. 
Notice that a 2 has increased from 0.8114 (p = 0.7) to 0.9752 (p = 0). Also notice that the 
standard errors of /h and have increased. This result is in accord with the theoretical 
results considered previously. 
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12.5 Relationship between Wages and Productivity in the Business 
Sector of the United States, 1960-2005 


Now that we have discussed the consequences of autocorrelation, the obvious question is, 
How do we detect it and how do we correct for it? Before we turn to these topics, it is use¬ 
ful to consider a concrete example. Table 12.4 gives data on indexes of real compensation 
per hour 7(RC0MPB) and output per hour X (PRODB) in the business sector of the U.S. 
economy for the period 1960-2005, the base of the indexes being 1992 = 100. 

First plotting the data on Y and W, we obtain Figure 12.7. Since the relationship between 
real compensation and labor productivity is expected to be positive, it is not surprising that 
the two variables are positively related. What is surprising is that the relationship between 
the two is almost linear, although there is some hint that at higher values of productivity the 
relationship between the two may be slightly nonlinear. Therefore, we decided to estimate 
a linear as well as a log-linear model, with the following results: 

ft = 32.7419 + 0.6704W, 
se = (1.3940) (0.0157) 

t= (23.4874) (42.7813) (12.5.1) 

r 2 = 0.9765 d= 0.1739 o = 2.3845 


TABLE 12.4 

Indexes of Real 
Compensation and 
Productivity, U.S., 
1960-2005 
(Index numbers, 

1992 = 100; 
quarterly data 
seasonally adjusted) 

Source: Economic Report of the 
President, 2007, Table B-49. 


Year Y X 

1960 60.8 48.9 

1961 62.5 50.6 

1962 64.6 52.9 

1963 66.1 55.0 

1964 67.7 56.8 

1965 69.1 58.8 

1966 71.7 61.2 

1967 73.5 62.5 

1968 76.2 64.7 

1969 77.3 65.0 

1970 78.8 66.3 

1971 80.2 69.0 

1972 82.6 71.2 

1973 84.3 73.4 

1974 83.3 72.3 

1975 84.1 74.8 

1976 86.4 77.1 

1977 87.6 78.5 

1978 89.1 79.3 

1979 89.3 79.3 

1980 89.1 79.2 

1981 89.3 80.8 

1982 90.4 80.1 


Year Y X 

1983 90.3 83.0 

1984 90.7 85.2 

1985 92.0 87.1 

1986 94.9 89.7 

1987 95.2 90.1 

1988 96.5 91.5 

1989 95.0 92.4 

1990 96.2 94.4 

1991 97.4 95.9 

1992 100.0 100.0 

1993 99.7 100.4 

1994 99.0 101.3 

1995 98.7 101.5 

1996 99.4 104.5 

1997 100.5 106.5 

1998 105.2 109.5 

1999 108.0 112.8 

2000 112.0 116.1 

2001 113.5 119.1 

2002 115.7 124.0 

2003 117.7 128.7 

2004 119.0 132.7 

2005 120.2 135.7 


Notes: Y = index of real compensation per hour, business sector 
X = index of output, business sector (1992 = 100). 


(1992= 100). 
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FIGURE 12.7 130 

Index of compensation 
( Y ) and index of 
productivity (AT), 

United States, 

1960-2005. 110 

100 


70 

60 



% °° 


where d is the Durbin-Watson statistic, which will be discussed shortly. 
h\Y t = 1.6067 + 0.6522 In X, 

se = (0.0547) (0.0124) 

t= (29.3680) (52.7996) (12.5.2) 

r 2 = 0.9845 d = 0.2176 a = 0.0221 

Since the above model is double-log, the slope coefficient represents elasticity. In the 
present case, we see that if labor productivity goes up by 1 percent, the average compensa¬ 
tion goes up by about 0.65 percent. 

Qualitatively, both the models give similar results. In both cases the estimated coeffi¬ 
cients are “highly” significant, as indicated by the high t values. In the linear model, if the 
index of productivity goes up by a unit, on average, the index of compensation goes up by 
about 0.67 units. In the log-linear model, the slope coefficient being elasticity (why?), we 
find that if the index of productivity goes up by 1 percent, on average, the index of real 
compensation goes up by about 0.65 percent. 

How reliable are the results given in Eqs. (12.5.1) and (12.5.2) if there is autocorrela¬ 
tion? As stated previously, if there is autocorrelation, the estimated standard errors are 
biased, as a result of which the estimated t ratios are unreliable. We obviously need to find 
out if our data suffer from autocorrelation. In the following section we discuss several 
methods of detecting autocorrelation. We will illustrate these methods with the log-linear 
model (12.5.2). 

12.6 Detecting Autocorrelation 
I. Graphical Method 

Recall that the assumption of nonautocorrelation of the classical model relates to the pop¬ 
ulation disturbances u t , which are not directly observable. What we have instead are their 
proxies, the residuals u t , which can be obtained by the usual OLS procedure. Although the 
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FIGURE 12.8 

Residuals (magnified 
100 times) and 
standardized residuals 
from the 

wages-productivity 
regression (log form: 
model 12.5.2). 


1960 1965 1970 1975 1980 1985 1990 1995 2000 2005 

Year 

it, are not the same thing as u t , xl very often a visual examination of the it’s gives us some 
clues about the likely presence of autocorrelation in the u’s. Actually, a visual examination 
of u, or (uf) can provide useful information not only about autocorrelation but also about 
heteroscedasticity (as we saw in the preceding chapter), model inadequacy, or specification 
bias, as we shall see in the next chapter. As one author notes: 

The importance of producing and analyzing plots [of residuals] as a standard part of statistical 
analysis cannot be overemphasized. Besides occasionally providing an easy to understand 
summary of a complex problem, they allow the simultaneous examination of the data as an ag¬ 
gregate while clearly displaying the behavior of individual cases. 18 

There are various ways of examining the residuals. We can simply plot them against 
time, the time sequence plot, as we have done in Figure 12.8, which shows the residuals 
obtained from the log wages-productivity regression (12.5.2). The values of these residu¬ 
als are given in Table 12.5 along with some other data. 

Alternatively, we can plot the standardized residuals against time, which are also 
shown in Figure 12.8 and Table 12.5. The standardized residuals are simply the residuals 
(u t ) divided by the standard error of the regression (Vd^), that is, they are (u t /cr). Notice 
that u, and a are measured in the units in which the regressand Y is measured. The values of 
the standardized residuals will therefore be pure numbers (devoid of units of measurement) 
and can be compared with the standardized residuals of other regressions. Moreover, the 
standardized residuals, like u t , have zero mean (why?) and approximately unit variance. 19 



17 Even if the disturbances u t are homoscedastic and uncorrelated, their estimators, the residuals, u t , 
are heteroscedastic and autocorrelated. On this, see C. S. Maddala, Introduction to Econometrics, 

2d ed., Macmillan, New York, 1992, pp. 48CMt81. However, it can be shown that as the sample 
size increases indefinitely, the residuals tend to converge to their true values, the u t 's. On this see, 

E. Malinvaud, Statistical Methods of Econometrics, 2d ed., North-Holland Publishers, Amsterdam, 

1970, p. 88. 

1 Stanford Weisberg, Applied Linear Regression, John Wiley Sc Sons, New York, 1980, p. 120. 

19 Actually, it is the so-called Studentized residuals that have a unit variance. But in practice the stan¬ 
dardized residuals will give the same picture, and hence we may rely on them. On this, see Norman 
Draper and Harry Smith, Applied Regression Analysis, 3d ed., John Wiley Sr Sons, New York, 1998, 
pp. 207-208. 
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TABLE 1 2.5 Residuals: Actual, Standardized, and Lagged 


Obs. 

SI 

SDRES 

S1(-1) 

Obs. 

SI 

SDRES 

S1(-1) 

1960 

-0.036068 

1.639433 

NA 

1983 

0.014416 

0.655291 

0.038719 

1961 

-0.030780 

-1.399078 

-0.036068 

1984 

0.001 774 

0.080626 

0.014416 

1962 

-0.026724 

1.214729 

-0.030780 

1985 

0.001620 

0.073640 

0.001 774 

1963 

-0.029160 

-1.325472 

-0.026724 

1986 

0.01 3471 

0.612317 

0.001620 

1964 

-0.026246 

1.193017 

-0.029160 

1987 

0.01 3725 

0.623875 

0.013471 

1965 

-0.028348 

-1.288551 

-0.026246 

1988 

0.01 7232 

0.783269 

0.013725 

1966 

-0.01 7504 

-0.795647 

-0.028348 

1989 

-0.004818 

-0.219005 

0.017232 

1967 

-0.006419 

-0.291 762 

-0.01 7504 

1990 

-0.006232 

-0.283285 

-0.004818 

1968 

0.007094 

0.322459 

-0.006419 

1991 

-0.004118 

-0.187161 

-0.006232 

1969 

0.018409 

0.836791 

0.007094 

1992 

-0.005078 

-0.230822 

-0.004118 

1970 

0.024713 

1.123311 

0.018409 

1993 

-0.010686 

-0.485739 

-0.005078 

1971 

0.016289 

0.74041 3 

0.024713 

1994 

-0.023553 

-1.070573 

-0.010686 

1972 

0.025305 

1.150208 

0.016289 

1995 

-0.027874 

-1.266997 

-0.023553 

1973 

0.025829 

1.1 74049 

0.025305 

1996 

-0.039805 

-1.809304 

-0.027874 

1974 

0.023744 

1.079278 

0.025829 

1997 

-0.041164 

-1.871079 

-0.039805 

1975 

0.011131 

0.505948 

0.023744 

1998 

-0.01 3576 

-0.617112 

-0.041164 

1976 

0.018359 

0.834515 

0.011131 

1999 

-0.006674 

-0.303364 

-0.013576 

1977 

0.020416 

0.927990 

0.018359 

2000 

0.010887 

0.494846 

-0.006674 

1978 

0.030781 

1.399135 

0.020416 

2001 

0.007551 

0.343250 

0.010887 

1979 

0.033023 

1.501051 

0.030781 

2002 

0.000453 

0.020599 

0.007551 

1980 

0.031604 

1.436543 

0.033023 

2003 

-0.006673 

-0.303298 

0.000453 

1981 

0.020801 

0.945516 

0.031604 

2004 

-0.015650 

-0.711380 

-0.006673 

1982 

0.038719 

1.759960 

0.020801 

2005 

-0.020198 

-0.918070 

-0.015650 


Notes: S1 = residuals from the wages-productivity regression (log form). 
SI (-1) = residuals lagged one period. 

SDRES = standardized residuals = residuals/standard error of estimate. 


In large samples (u t /o) is approximately normally distributed with zero mean and unit vari¬ 
ance. For our example, a = 2.6755. 

Examining the time sequence plot given in Figure 12.8, we observe that both it, and the 
standardized u, exhibit a pattern observed in Figure 12.1 d, suggesting that perhaps u t are 
not random. 

To see this differently, we can plot u t against u t - 1, that is, plot the residuals at time t 
against their value at time (t — 1), a kind of empirical test of the AR(1) scheme. If the 
residuals are nonrandom, we should obtain pictures similar to those shown in Figure 12.3. 
This plot for our log wages-productivity regression is as shown in Figure 12.9; the under¬ 
lying data are given in Table 12.5. As this figure reveals, most of the residuals are bunched 
in the second (northeast) and the fourth (southwest) quadrants, suggesting a strong positive 
correlation in the residuals. 

The graphical method we have just discussed, although powerful and suggestive, is sub¬ 
jective or qualitative in nature. But there are several quantitative tests that one can use to 
supplement the purely qualitative approach. We now consider some of these tests. 


II. The Runs Test 

If we carefully examine Figure 12.8, we notice a peculiar feature: Initially, we have several 
residuals that are negative, then there is a series of positive residuals, and then there are sev¬ 
eral residuals that are negative. If these residuals were purely random, could we observe 
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FIGURE 12.9 

Current residuals 
versus lagged 
residuals. 


0 


IV 



such a pattern? Intuitively, it seems unlikely. This intuition can be checked by the so-called 
runs test, sometimes also known as the Geary test, a nonparametric test. 20 

To explain the runs test, let us simply note down the signs (+ or —) of the residuals 
obtained from the wages-productivity regression, which are given in the first column of 
Table 12.5. 


-)(+++)(-) 

( 12 . 6 . 1 ) 

Thus there are 8 negative residuals, followed by 21 positive residuals, followed by 11 neg¬ 
ative residuals, followed by 3 positive residuals, followed by 3 negative residuals, for a total 
of 46 observations. 

We now define a run as an uninterrupted sequence of one symbol or attribute, such as 
+ or —. We further define the length of a run as the number of elements in it. In the se¬ 
quence shown in Eq. (12.6.1), there are 5 runs: a run of 8 minuses (i.e., of length 8), a run 
of 21 pluses (i.e., of length 21), a run of 11 minuses (i.e., of length 11), a rim of 3 pluses 
(i.e., of length 3), and a run of 3 minuses (i.e., of length 3). For a better visual effect, we 
have presented the various runs in parentheses. 

By examining how runs behave in a strictly random sequence of observations, one can 
derive a test of randomness of runs. We ask this question: Are the 5 runs observed in our 
illustrative example consisting of 46 observations too many or too few compared with the 
number of runs expected in a strictly random sequence of 46 observations? If there are too 


20 ln nonparametric tests we make no assumptions about the (probability) distribution from which 
the observations are drawn. On the Geary test, see R. C. Geary, "Relative Efficiency of Count Sign 
Changes for Assessing Residual Autoregression in Least Squares Regression," Biometrika, vol. 57, 
1970, pp. 123-127. 
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many runs, it would mean that in our example the residuals change sign frequently, thus 
indicating negative serial correlation (cf. Figure 12.3 b). Similarly, if there are too few runs, 
they may suggest positive autocorrelation, as in Figure 12.3a. A priori, then, Figure 12.8 
would indicate positive correlation in the residuals. 

Now let 

N = total number of observations = N\ + N2 
Ni = number of + symbols (i.e., + residuals) 

N2 = number of — symbols (i.e., — residuals) 

R — number of runs 


Then under the null hypothesis that the successive outcomes (here, residuals) are indepen¬ 
dent, and assuming that N\ > 10 and N2 > 10, the number of runs is ( asymptotically ) 
normally distributed with 


Mean: 


E(R) = 


2N1N2 
N + 


Variance: 


2N l N 2 (2N l N 2 - N ) 

(N) 2 (N - 1 ) 


( 12 . 6 . 2 ) 


Note: N = N\ + N2. 

If the null hypothesis of randomness is sustainable, following the properties of the nor¬ 
mal distribution, we should expect that 

Prob [E(R) - 1.96<7* < R < E(R) + 1.96(7*] = 0.95 (12.6.3) 

That is, the probability is 95 percent that the preceding interval will include R. Therefore 
we have this rule: 


Decision Rule Do not reject the null hypothesis of randomness with 95% confidence if R, the number of 
runs, lies in the preceding confidence interval; reject the null hypothesis if the estimated R 
lies outside these limits. (Note: You can choose any level of confidence you want.) 


Returning to our example, we know that N\ , the number of pluses, is 24 and Nz, the num¬ 
ber of minuses, is 22 and R — 5. Using the formulas given in Eq. (12.6.2), we obtain: 

E(R) = 24 

a l = n (12.6.4) 

a* = 3.32 

The 95% confidence interval for R in our example is thus: 

[24 ± 1.96(3.32)] = (17.5, 30.5) 

Obviously, this interval does not include 5. Hence, we can reject the hypothesis that the 
residuals in our wages-productivity regression are random with 95% confidence. In other 
words, the residuals exhibit autocorrelation. As a general rule, if there is positive autocor¬ 
relation, the number of runs will be few, whereas if there is negative autocorrelation, the 
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number of runs will be many. Of course, from Eq. (12.6.2) we can find out whether we have 
too many runs or too few runs. 

Swed and Eisenhart have developed special tables that give critical values of the runs 
expected in a random sequence of N observations if N\ or N 2 is smaller than 20. These 
tables are given in Appendix D, Table D.6. Using these tables, the reader can verify that the 
residuals in our wages-productivity regression are indeed nonrandom; actually they are 
positively correlated. 

III. Durbin-Watson dTest 21 

The most celebrated test for detecting serial correlation is that developed by statisticians 
Durbin and Watson. It is popularly known as the Durbin-Watson d statistic, which is de¬ 
fined as 


d — 


~ “t- 1) 2 


(12.6.5) 


which is simply the ratio of the sum of squared differences in successive residuals to the 
RSS. Note that in the numerator of the d statistic the number of observations is n — 1 be¬ 
cause one observation is lost in taking successive differences. 

A great advantage of the d statistic is that it is based on the estimated residuals, which 
are routinely computed in regression analysis. Because of this advantage, it is now a com¬ 
mon practice to report the Durbin-Watson d along with summary measures, such as R 2 , ad¬ 
justed R 2 , t, and F. Although it is now routinely used, it is important to note the assumptions 
underlying the d statistic. 

1. The regression model includes the intercept term. If it is not present, as in the case of 
the regression through the origin, it is essential to rerun the regression including the inter¬ 
cept term to obtain the RSS. 22 

2. The explanatory variables, the X’s, are nonstochastic, or fixed in repeated sampling. 

3. The disturbances u, are generated by the first-order autoregressive scheme: 
u, — pu t ~\ + s t . Therefore, it cannot be used to detect higher-order autoregressive 
schemes. 

4. The error term u, is assumed to be normally distributed. 

5. The regression model does not include the lagged value(s) of the dependent variable 
as one of the explanatory variables. Thus, the test is inapplicable in models of the follow¬ 
ing type: 

Y, = Pi + PiXit + foX 3t + • • • + p k X kt + Y Y,-x+u t (12.6.6) 

where Y t _\ is the one period lagged value of Y. Such models are known as autoregressive 
models, which we will study in Chapter 17. 

6. There are no missing observations in the data. Thus, in our wages-productivity regres¬ 
sion for the period 1960-2005, if observations for, say, 1978 and 1982 were missing for some 
reason, the d statistic would make no allowance for such missing observations. 23 


21 j. Durbin and C. S. Watson, "Testing for Serial Correlation in Least-Squares Regression," Biometrika, 
vol. 38, 1951, pp. 159-171. 

“However, R. W. Farebrother has calculated d values when the intercept term is absent from the 
model. See his "The Durbin-Watson Test for Serial Correlation When There Is No Intercept in the 
Regression," Econometrica, vol. 48, 1980, pp. 1553-1563. 

23 For further details, see Gabor Korosi, Laszlo Matyas, and Istvan P. Szekey, Practical Econometrics, 
Avebury Press, England, 1992, pp. 88-89. 
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FIGURE 12.10 

Durbin-Watson d 
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The exact sampling or probability distribution of the d statistic given in Eq. (12.6.5) is 
difficult to derive because, as Durbin and Watson have shown, it depends in a complicated 
way on the X values present in a given sample. 24 This difficulty should be understandable 
because d is computed from u t , which are, of course, dependent on the given X’s. There¬ 
fore, unlike the t, F, or / 2 tests, there is no unique critical value that will lead to the rejec¬ 
tion or the acceptance of the null hypothesis that there is no first-order serial correlation in 
the disturbances u,. However, Durbin and Watson were successful in deriving a lower 
bound dL and an upper bound d v such that if the computed d from Eq. (12.6.5) lies outside 
these critical values, a decision can be made regarding the presence of positive or negative 
serial correlation. Moreover, these limits depend only on the number of observations n and 
the number of explanatory variables and do not depend on the values taken by these 
explanatory variables. These limits, for n going from 6 to 200 and up to 20 explanatory 
variables, have been tabulated by Durbin and Watson and are reproduced in Appendix D, 
Table D.5 (up to 20 explanatory variables). 

The actual test procedure can be explained better with the aid of Figure 12.10, which 
shows that the limits of d are 0 and 4. These can be established as follows. Expand 
Eq. (12.6.5) to obtain 


Since E “r ar| d E“?-1 
Therefore, setting E m 2 _, 

^ 2 ( ! ~ ) (1Z6 ‘ 8) 


E «? + E «?_i - 2 E «A-1 


(12.6.7) 


differ in only one observation, they are approximately equal. 
, « E w ?j Eq. (12.6.7) may be written as 


where ~ means approximately. 
Now let us define 


_ E «»«*-] 
= Efi? 


(12.6.9) 


24 But see the discussion on the "exact" Durbin-Watson 


: given later in the section. 
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as the sample first-order coefficient of autocorrelation, an estimator of p. (See footnote 9.) 
Using Eq. (12.6.9), we can express Eq. (12.6.8) as 

d^2(\-p) ( 12 . 6 . 10 ) 

But since — 1 < p < 1, Eq. (12.6.10) implies that 

0 < < 4 (12.6.11) 

These are the bounds of d; any estimated d value must lie within these limits. 

It is apparent from Eq. (12.6.10) that if p = 0, d = 2; that is, if there is no serial corre¬ 
lation (of the first-order), d is expected to be about 2. Therefore, as a nde of thumb, if d is 
found to be 2 in an application, one may assume that there is no first-order autocorrelation, 
either positive or negative. If p = +1, indicating perfect positive correlation in the residu¬ 
als, <7 ~ 0. Therefore, the closer d is to 0, the greater the evidence of positive serial corre¬ 
lation. This relationship should be evident from Eq. (12.6.5) because if there is positive 
autocorrelation, the u t ’s will be bunched together and their differences will therefore tend 
to be small. As a result, the numerator sum of squares will be smaller in comparison with 
the denominator sum of squares, which remains a unique value for any given regression. 

If p = — 1, that is, there is perfect negative correlation among successive residuals, 
d & 4. Hence, the closer d is to 4, the greater the evidence of negative serial correlation. 
Again, looking at Eq. (12.6.5), this is understandable. For if there is negative autocorrela¬ 
tion, a positive ii t will tend to be followed by a negative u, and vice versa so that | it, - u t -\ 
will usually be greater than \ ii,\. Therefore, the numerator of d will be comparatively larger 
than the denominator. 

The mechanics of the Durbin-Watson test are as follows, assuming that the assumptions 
underlying the test are fulfilled: 

1. Run the OLS regression and obtain the residuals. 

2. Compute d from Eq. (12.6.5). (Most computer programs now do this routinely.) 

3. For the given sample size and given number of explanatory variables, find out the criti¬ 
cal di and du values. 

4. Now follow the decision rules given in Table 12.6. For ease of reference, these decision 
rules are also depicted in Figure 12.10. 

To illustrate the mechanics, let us return to our wages-productivity regression. From the 
data given in Table 12.5 the estimated d value can be shown to be 0.2175, suggesting that 
there is positive serial correlation in the residuals. From the Durbin-Watson tables, we 
find that for 46 observations and one explanatory variable, dL = 1.475 and du = 1.566 at 
the 5 percent level. Since the computed d of 0.2175 lies below dL, we cannot reject the 
hypothesis that there is positive serial correlation in the residuals. 

Although extremely popular, the d test has one great drawback in that, if it falls in the 
indecisive zone, one cannot conclude that (first-order) autocorrelation does or does not 


TABLE 12.6 

Durbin-Watson d 
Test: Decision Rules 


Null Hypothesis 

Decision 

If 

No positive autocorrelation 

Reject 

0 < d < d L 

No positive autocorrelation 

No decision 

d L <d<du 

No negative correlation 

Reject 

4 - d L < d < 4 

No negative correlation 

No decision 

4-du<d<4-dL 

No autocorrelation, positive or negative 

Do not reject 

du < d < 4 — du 
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exist. To solve this problem, several authors have proposed modifications of the d test but 
they are rather involved and beyond the scope of this book. 25 In many situations, however, 
it has been found that the upper limit du is approximately the true significance limit and 
therefore in case d lies in the indecisive zone, one can use the following modified d test: 
Given the level of significance a, 

1. Ho: p — 0 versus Hi: p > 0. Reject Ho at a level if d < du. That is, there is statistically 
significant positive autocorrelation. 

2. H 0 : p — 0 versus Hp. p < 0. Reject H 0 at a level if the estimated (4 — d) < du, that is, 
there is statistically significant evidence of negative autocorrelation. 

3. Ho', p = 0 versus H\ : p / 0. Reject H 0 at 2a level if d < du or (4 - d) < du, that is, 
there is statistically significant evidence of autocorrelation, positive or negative. 

It may be pointed out that the indecisive zone narrows as the sample size increases, 
which can be seen clearly from the Durbin-Watson tables. For example, with 4 regressors 
and 20 observations, the 5 percent lower and upper d values are 0.894 and 1.828, respec¬ 
tively, but these values are 1.515 and 1.739 if the sample size is 75. 

The computer program SHAZAM performs an exact d test, that is, it gives the p value, 
the exact probability of the computed d value. With modem computing facilities, it is no 
longer difficult to find the p value of the computed d statistic. Using SHAZAM (version 9) 
for our wages-productivity regression, we find the p value of the computed d of 0.2176 is 
practically zero, thereby reconfirming our earlier conclusion based on the Durbin-Watson 
tables. 

The Durbin-Watson d test has become so venerable that practitioners often forget the as¬ 
sumptions underlying the test. In particular, the assumptions that (1) the explanatory vari¬ 
ables, or regressors, are nonstochastic; (2) the error term follows the normal distribution; 
(3) the regression models do not include the lagged value(s) of the regressand; and (4) only 
the first-order serial correlation is taken into account are very important for the application 
of the d test. It should also be added that a significant d statistic may not necessarily indi¬ 
cate autocorrelation. Rather, it may be an indication of omission of relevant variables from 
the model. 

If a regression model contains lagged value(s) of the regressand, the d value in such 
cases is often around 2, which would suggest that there is no (first-order) autocorrelation in 
such models. Thus, there is a built-in bias against discovering (first-order) autocorrelation 
in such models. This does not mean that autoregressive models do not suffer from the au¬ 
tocorrelation problem. As a matter of fact, Durbin has developed the so-called h test to test 
serial correlation in such models. But this test is not as powerful, in a statistical sense, as 
the Breusch-Godfrey test to be discussed shortly, so there is no need to use the h test. 
However, because of its historical importance, it is discussed in Exercise 12.36. 

Also, if the error term u t are not NIID, the routinely used d test may not be reliable. 26 In 
this respect the runs test discussed earlier has an advantage in that it does not make any 
(probability) distributional assumption about the error term. However, if the sample is large 
(technically infinite), we can use the Durbin-Watson d, for it can be shown that 27 



( 12 . 6 . 12 ) 


25 For details, see Thomas B. Fomby, R. Carter Hill, and Stanley R. Johnson, Advanced Econometric 
Methods, Springer Verlag, New York, 1984, pp. 225-228. 

26 For an advanced discussion, see Ron C. Mittelhammer, George G. Judge, and Douglas J. Miller, 
Econometric Foundations, Cambridge University Press, New York, 2000, p. 550. 

27 See James Davidson, Econometric Theory, Blackwell Publishers, New York, 2000, p. 161. 
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That is, in large samples the d statistic as transformed in Eq. (12.6.12) follows the standard 
normal distribution. Incidentally, in view of the relationship between d and p, the estimated 
first-order autocorrelation coefficient, shown in Eq. (12.6.10), it follows that 


V^P « M(0, 1) 


(12.6.13) 


that is, in large samples, the square root of the sample size times the estimated first-order 
autocorrelation coefficient also follows the standard normal distribution. 

As an illustration of the test, for our wages-productivity example, we found that 
d = 0.2176 with n = 46. Therefore, from Eq. (12.6.12) we find that 



Asymptotically, if the null hypothesis of zero (first-order) autocorrelation were true, the 
probability of obtaining a Z value (i.e., a standardized normal variable) of as much as 6.0447 
or greater is extremely small. Recall that for a standard normal distribution, the (two-tail) 
critical 5 percent Z value is only 1.96 and the 1 percent critical Z value is about 2.58. Al¬ 
though our sample size is only 46, for practical purposes it may be large enough to use the 
normal approximation. The conclusion remains the same, namely, that the residuals from 
the wages-productivity regression suffer from autocorrelation. 

But the most serious problem with the d test is the assumption that the regressors are 
nonstochastic, that is, their values are fixed in repeated sampling. If this is not the case, then 
the d test is not valid either in finite, or small, samples or in large samples. 28 And since this 
assumption is usually difficult to maintain in economic models involving time series data, 
one author contends that the Durbin-Watson statistic may not he useful in econometrics in¬ 
volving time series data. 29 In his view, more useful tests of autocorrelation are available, 
but they are all based on large samples. We discuss one such test below, the Breusch- 
Godfrey test. 


IV. A General Test of Autocorrelation: 
The Breusch-Godfrey (BG) Test 30 


To avoid some of the pitfalls of the Durbin-Watson d test of autocorrelation, statisticians 
Breusch and Godfrey have developed a test of autocorrelation that is general in the sense 
that it allows for (1) nonstochastic regressors, such as the lagged values of the regressand; 
(2) higher-order autoregressive schemes, such as AR(1), AR(2), etc.; and (3) simple or 
higher-order moving averages of white noise error terms, such as s t in Eq. (12.2.1). 31 

Without going into the mathematical details, which can be obtained from the refer¬ 
ences, the BG test, which is also known as the LM test, 32 proceeds as follows: We use the 


28 lbid., p. 161. 

29 Fumio Hayashi, Econometrics, Princeton University Press, Princeton, NJ, 2000, p. 45. 

30 See, L. C. Godfrey, "Testing Against General Autoregressive and Moving Average Error Models 
When the Regressor Includes Lagged Dependent Variables," Econometrica, vol. 46, 1978, 
pp. 1293-1302, and T. S. Breusch, "Testing for Autocorrelation in Dynamic Linear Models," 
Australian Economic Papers, vol. 1 7, 1978, pp. 334-355. 

31 For example, in the regression Y t = + fcXt + u t the error term can be represented as 

Ut = e t + T-i s t _i -I- Xzet-2, which represents a three-period moving average of the white noise error 

term e t . 

32 The test is based on the Lagrange multiplier principle briefly mentioned in Chapter 8. 
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two-variable regression model to illustrate the test, although many regressors can be added 
to the model. Also, lagged values of the regressand can be added to the model. Let 

Y t =/3i+ p 2 X t + u t (12.6.14) 

Assume that the error term u, follows the /rth-order autoregressive, AR(p), scheme as follows: 

u t = PiUt-i + p2U t ~2 + • ■ ■ + PpU t -p + e t (12.6.15) 

where e t is a white noise error term as discussed previously. As you will recognize, this is 
simply the extension of the AR(1) scheme. 

The null hypothesis //q to be tested is that 

H 0 :p l = p 2 = --- = pp = 0 (12.6.16) 

That is, there is no serial correlation of any order. The BG test involves the following steps: 

1. Estimate Eq. (12.6.14) by OLS and obtain the residuals, u t . 

2. Regress u t on the original X, (if there is more than one X variable in the original 
model, include them also) and u t ~\, Ut-2, ■ ■ ■, u t ~ p , where the latter are the lagged values 
of the estimated residuals in step 1. Thus, ifp — 4, we will introduce four lagged values of 
the residuals as additional regressors in the model. Note that to run this regression we will 
have only (n — p ) observations (why?). In short, run the following regression: 

Ut = «i + oti-^t + p\Ut—\ + PiUf—2 + • • • + PpUt-p + £< (12.6.17) 

and obtain R 2 from this (auxiliary) regression. 33 

3. If the sample size is large (technically, infinite), Breusch and Godfrey have shown 
that 

(n — p)R 2 ~ X p (12.6.18) 

That is, asymptotically, n — p times the R 2 value obtained from the auxiliary regression 
(12.6.17) follows the chi-square distribution with p df. If in an application, (n - p)R 2 ex¬ 
ceeds the critical chi-square value at the chosen level of significance, we reject the null 
hypothesis, in which case at least one p in Eq. (12.6.15) is statistically significantly different 
from zero. 

The following practical points about the BG test may be noted: 

1. The regressors included in the regression model may contain lagged values of the re¬ 
gressand Y, that is, Y t -i, Y t - 2 , etc., may appear as explanatory variables. Contrast this 
model with the Durbin-Watson test restriction that there may be no lagged values of the re¬ 
gressand among the regressors. 

2. As noted earlier, the BG test is applicable even if the disturbances follow a pth-order 
moving average (MA) process, that is, the u t are generated as follows: 

u t = e, +^ 2 G -2 H-1 -XpSt-p (12.6.19) 

where e t is a white noise error term, that is, the error term that satisfies all the classical 
assumptions. 


33 The reason that the original regressor X is included in the model is to allow for the fact that X 
may not be strictly nonstochastic. But if it is strictly nonstochastic, it may be omitted from the model. 
On this, see Jeffrey M. Wooldridge, Introductory Econometrics: A Modern Approach, South-Western 
Publishing Co., 2003, p. 386. 
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In the chapters on time series econometrics, we will study in some detail the pth-order 
autoregressive and moving average processes. 

3. If in Eq. (12.6.15) p — 1, meaning first-order autoregression, then the BG test is 
known as Durbin’s M test. 

4. A drawback of the BG test is that the value of p, the length of the lag, cannot be spec¬ 
ified a priori. Some experimentation with the p value is inevitable. Sometimes one can use 
the so-called Akaike and Schwarz information criteria to select the lag length. We will dis¬ 
cuss these criteria in Chapter 13 and later in the chapters on time series econometrics. 

5. Given the values of the A variable(s) and the lagged values of u, the test assumes that 
the variance of u in Eq. (12.6.15) is homoscedastic. 


Illustration of the 
BG Test: The 
Wages- 
Productivity 
Relation 


To illustrate the test, we will apply it to our illustrative example. Using an AR(6) scheme, 
we obtained the results shown in Exercise 12.25. From the regression results given there, 
it can be seen that (n — p) = 40 and R 2 = 0.7498. Therefore, multiplying these two, we 
obtain a chi-square value of 29.992. For 6 df (why?), the probability of obtaining a chi- 
square value of as much as 29.992 or greater is extremely small; the chi-square table in 
Appendix D.4 shows that the probability of obtaining a chi-square value of as much as 
18.5476 or greater is only 0.005. Therefore, for the same df, the probability of obtaining 
a chi-square value of about 30 must be extremely small. As a matter of fact, the actual 
p value is almost zero. 

Therefore, the conclusion is that, for our example, at least one of the six autocorrela¬ 
tions must be nonzero. 

Trying varying lag lengths from 1 to 6, we find that only the AR(1) coefficient is signifi¬ 
cant, suggesting that there is no need to consider more than one lag. In essence the BG test 
in this case turns out to be Durbin's m test. 


Why So Many Tests of Autocorrelation? 

The answer to this question is that “... no particular test has yet been judged to be un¬ 
equivocally best [i.e., more powerful in the statistical sense], and thus the analyst is still in 
the unenviable position of considering a varied collection of test procedures for detecting 
the presence or structure, or both, of autocorrelation.” 34 Of course, a similar argument can 
be made about the various tests of heteroscedasticity discussed in the previous chapter. 

12.7 What to Do When You Find Autocorrelation: 

Remedial Measures 


If after applying one or more of the diagnostic tests of autocorrelation discussed in the pre¬ 
vious section, we find that there is autocorrelation, what then? We have four options: 

1. Try to find out if the autocorrelation is pure autocorrelation and not the result of 
mis-specification of the model. As we discussed in Section 12.1, sometimes we observe 
patterns in residuals because the model is mis-specified—that is, it has excluded some 
important variables—or because its functional form is incorrect. 


34 Ron C. Mittelhammer et al., op. cit., p. 547. Recall that the power of a statistical test is 1 minus 
the probability of committing a Type II error, that is, 1 minus the probability of accepting a false 
hypothesis. The maximum power of a test is 1 and the minimum is 0. The closer the power of a test 
is to zero, the worse is that test, and the closer it is to 1, the more powerful is that test. What these 
authors are essentially saying is that there is no single most powerful test of autocorrelation. 
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2. If it is pure autocorrelation, one can use appropriate transformation of the original 
model so that in the transformed model we do not have the problem of (pure) autocorrela¬ 
tion. As in the case of heteroscedasticity, we will have to use some type of generalized 
least-square (GLS) method. 

3. In large samples, we can use the Newey-West method to obtain standard errors of 
OLS estimators that are corrected for autocorrelation. This method is actually an extension 
of White’s heteroscedasticity-consistent standard errors method that we discussed in the 
previous chapter. 

4. In some situations we can continue to use the OLS method. 

Because of the importance of each of these topics, we devote a section to each one. 

12.8 Model Mis-Specification versus Pure Autocorrelation 

Let us return to our wages-productivity regression given in Eq. (12.5.2). There we saw that 
the d value was 0.2176 and based on the Durbin-Watson d test we concluded that there was 
positive correlation in the error term. Could this correlation have arisen because our model 
was not correctly specified? Since the data underlying regression (12.5.1) is time series 
data, it is quite possible that both wages and productivity exhibit trends. If that is the case, 
then we need to include the time or trend, t, variable in the model to see the relationship 
between wages and productivity net of the trends in the two variables. 

To test this, we included the trend variable in Eq. (12.5.2) and obtained the following 
results 


Y t = 0.1209 + 1.0283X, - 0.0075t 

se = (0.3070) (0.0776) (0.0015) 

( 12 . 8 . 1 ) 

t = (0.3939) (13.2594) (-4.8903) 

R 2 = 0.9900; d = 0.4497 

The interpretation of this model is straightforward: Over time, the index of real wages has 
been decreasing by about 0.75 units per year. After allowing for this, if the productivity 
index went up by one unit, on average, the real compensation went up by about one unit. 
What is interesting to note is that even allowing for the trend variable, the d value is still 
very low, suggesting that Eq. (12.8.1) suffers from pure autocorrelation and not necessarily 
specification error. 

How do we know that Eq. (12.8.1) is the correct specification? To test this, we regress Y 
on X and X 2 to test for the possibility that the real wage index may be nonlinearly related 
to the productivity index. The results of this regression are as follows: 

Y t = -L7843 + 2.1963X,- 0.1752X 2 

t= (—2.7713) (7.5040) (-5.2785) (12.8.2) 

R 2 = 0.9906 d = 0.3561 

We leave it to the reader to interpret these results. For the present purposes, look at the 
Durbin-Watson, which is still quite low, suggesting that we still have positive serial corre¬ 
lation in the residuals. 

It may be safe to conclude from the preceding analysis that our wages-productivity re¬ 
gression probably suffers from pure autocorrelation and not necessarily from specification 



442 Part Two Relaxing the Assumptions of the Classical Model 


bias. Knowing the consequences of autocorrelation, we may therefore want to take some 
corrective action. We will do so shortly. 

Incidentally, for all the wages-productivity regressions that we have presented above, 
we applied the Jarque-Bera test of normality and found that the residuals were normally 
distributed, which is comforting because the d test assumes normality of the error term. 

12.9 Correcting for (Pure) Autocorrelation: 

The Method of Generalized Least Squares (GLS) 


Knowing the consequences of autocorrelation, especially the lack of efficiency of OLS 
estimators, we may need to remedy the problem. The remedy depends on the knowledge 
one has about the nature of interdependence among the disturbances, that is, knowledge 
about the structure of autocorrelation. 

As a starter, consider the two-variable regression model: 

Y t =fa+ p 2 X t + u t (12.9.1) 

and assume that the error term follows the AR(1) scheme, namely, 

u t — pu t -\ + e t — 1 < p < 1 (12.9.2) 

Now we consider two cases: (1) p is known and (2) p is not known but has to be estimated. 

When p Is Known 

If the coefficient of first-order autocorrelation is known, the problem of autocorrelation can 
be easily solved. If Eq. (12.9.1) holds true at time t, it also holds true at time (t — 1). Hence, 

Y t - 1 = Pi + p 2 X t _ x + u t _i (12.9.3) 

Multiplying Eq. (12.9.3) by p on both sides, we obtain 

pY t ~ i = pP i + pP%X t - x + pu t -\ (12.9.4) 

Subtracting Eq. (12.9.4) from Eq. (12.9.1) gives 

(■ Y t - pY t -i) = y®i(l -p) + p 2 (X t - pX,_ i) + s t (12.9.5) 


where s t = ( u t - pu t -\) 

We can express Eq. (12.9.5) as 


Y; = p* + p*x; + e t (12.9.6) 

where /Jf = p x {\ - p), Y* = ( Y t - pY t - 1), X* = (X t - pX t - 1), and p* 2 = p 2 . 

Since the error term in Eq. (12.9.6) satisfies the usual OLS assumptions, we can apply 
OLS to the transformed variables Y* and X* and obtain estimators with all the optimum 
properties, namely, BLUE. In effect, running Eq. (12.9.6) is tantamount to using general¬ 
ized least squares (GLS) discussed in the previous chapter—recall that GLS is nothing but 
OLS applied to the transformed model that satisfies the classical assumptions. 

Regression (12.9.5) is known as the generalized, or quasi, difference equation. It involves 
regressing 7 on X, not in the original form, but in the difference form, which is obtained by 
subtracting a proportion (= p) of the value of a variable in the previous time period from its 
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value in the current time period. In this differencing procedure we lose one observation 
because the first observation has no antecedent. To avoid thi s loss o f one obs ervation , the 
first observation on Y and X is transformed as follows: 35 Y\fl — p 2 and X\f\ — p 2 . This 
transformation is known as the Prais-Winsten transformation. 

When p Is Not Known 

Although conceptually straightforward to apply, the method of generalized difference 
given in Eq. (12.9.5) is difficult to implement because p is rarely known in practice. There¬ 
fore, we need to find ways of estimating p. We have several possibilities. 

The First-Difference Method 

Since p lies between 0 and ±1, one could start from two extreme positions. At one extreme, 
one could assume that p — 0, that is, no (first-order) serial correlation, and at the other 
extreme we could let p = ±1, that is, perfect positive or negative correlation. As a matter 
of fact, when a regression is run, one generally assumes that there is no autocorrelation 
and then lets the Durbin-Watson or other test show whether this assumption is justified. 
If, however, p = +1, the generalized difference equation (12.9.5) reduces to the first- 
difference equation: 


Y t - F,_! = fi 2 {X t - X t _f> + (u, - u,-0 


or 


AY t = p 2 AX t + e t (12.9.7) 

where A is the first-difference operator introduced in Eq. (12.1.10). 

Since the error term in Eq. (12.9.7) is free from (first-order) serial correlation (why?), to 
run the regression (12.9.7) all one has to do is form the first differences of both the regres- 
sand and regressor(s) and run the regression on these first differences. 

The first-difference transformation may be appropriate if the coefficient of autocorrela¬ 
tion is very high, say in excess of 0.8, or the Durbin-Watson d is quite low. Maddala has 
proposed this rough rule of thumb: Use the first-difference form whenever d < R 2 . 36 This is 
the case in our wages-productivity regression (12.5.2), where we found that d = 0.2176 and 
r 2 = 0.9845. The first-difference regression for our illustrative example will be presented 
shortly. 

An interesting feature of the first-difference model (12.9.7) is that there is no intercept 
in it. Hence, to estimate Eq. (12.9.7), you have to use the regression through the origin 
routine (that is, suppress the intercept term), which is now available in most software pack¬ 
ages. If, however, you forget to drop the intercept term in the model and estimate the fol¬ 
lowing model that includes the intercept term 

AY, =ff + fa AX, + e, (12.9.8) 


35 The loss of one observation may not be very serious in large samples but can make a substantial 
difference in the results in small samples. Without transforming the first observation as indicated, the 
error variance will not be homoscedastic. On this, see Jeffrey Wooldridge, op. cit., p. 388. For some 
Monte Carlo results on the importance of the first observation, see Russell Davidson and James C. 
MacKinnon, Estimation and Inference in Econometrics, Oxford University Press, New York, 1993, 

Table 10.1, p. 349. 

36 Maddala, op. cit., p. 232. 
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then the original model must have a trend in it and f>\ represents the coefficient of the trend 
variable. 37 Therefore, one “accidental” benefit of introducing the intercept term in the first- 
difference model is to test for the presence of a trend variable in the original model. 

Returning to our wages-productivity regression (12.5.2), and given the AR(1) scheme 
and a low d value in relation to r 2 , we rerun Eq. (12.5.2) in the first-difference form with¬ 
out the intercept term; remember that Eq. (12.5.2) is in the level form. The results are as 
follows: 38 

AT, = 0.6539AX, 

t = (11.4042) r 2 = 0.4264 d= 1.7442 O 2 - 9 - 9 ) 

Compared with the level form regression (12.5.2), we see that the slope coefficient has not 
changed much, but the r 2 value has dropped considerably. This is generally the case 
because by taking the first differences we are essentially studying the behavior of variables 
around their (linear) trend values. Of course, we cannot compare the r 2 of Eq. (12.9.9) 
directly with that of the r 2 of Eq. (12.5.2) because the dependent variables in the two mod¬ 
els are different. 39 Also, notice that compared with the original regression, the d value has 
increased dramatically, perhaps indicating that there is little autocorrelation in the first- 
difference regression. 40 

Another interesting aspect of the first-difference transformation relates to the stationar- 
ity properties of the underlying time series. Return to Eq. (12.2.1), which describes the 
AR(1) scheme. Now if in fact p = 1, then it is clear from Eqs. (12.2.3) and (12.2.4) that the 
series u t is nonstationary, for the variances and covariances become infinite. That is why, 
when we discussed this topic, we put the restriction that |p| < 1. But it is clear from 
Eq. (12.2.1) that if the autocorrelation coefficient is in fact 1, then Eq. (12.2.1) becomes 

u t = u ,-1 + e, 


or 


{lit — u t - 1) = Am, = e, (12.9.10) 

That is, it is the first-differenced m, that becomes stationary, for it is equal to e t , which is a 
white noise error term. 

The point of the preceding discussion is that if the original time series are nonstationary, 
very often their first differences become stationary. And, therefore, first-difference trans¬ 
formation serves a dual purpose in that it might get rid of (first-order) autocorrelation and 
also render the time series stationary. We will revisit this topic in Part 5, where we discuss 
the econometrics of time series analysis in some depth. 

We mentioned that the first-difference transformation may be appropriate if p is high or 
d is low. Strictly speaking, the first-difference transformation is valid only if p — 1. As a 


37 This is easy to show. Let Y t = ai + Pi t+ p2%t + u t . Therefore, Y t -i = a + /3i(f - 1) + p2X t -1 + ut-i. 
Subtracting the latter from the former, you will obtain: AY t = pi + p2 AX t + s t , which shows that the 
intercept term in this equation is indeed the coefficient of the trend variable in the original model. 
Remember that we are assuming that p = 1. 

38 ln Exercise 12.38 you are asked to run this model, including the constant term. 

39 The comparison of r 2 in the level and first-difference form is slightly involved. For an extended 
discussion on this, see Maddala, op. cit., Chapter 6. 

40 lt is not clear whether the computed d in the first-difference regression can be interpreted in the 
same way as it was in the original, level form regression. However, applying the runs test, it can be 
seen that there is no evidence of autocorrelation in the residuals of the first-difference regression. 
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matter of fact, there is a test, called the Berenblutt-Webb test, 41 to test the hypothesis that 
p — 1. The test statistic they use is called the g statistic, which is defined as follows: 


g = 


Ell 

E? «t 


(12.9.11) 


where u, are the OLS residuals from the original (i.e., level form) regression and e, are the 
OLS residuals from the first-difference regression. Keep in mind that in the first-difference 
form there is no intercept. 

To test the significance of the g statistic, assuming that the level form regression con¬ 
tains the intercept term, we can use the Durbin-Watson tables except that now the null 
hypothesis is that p = 1 rather than the Durbin-Watson hypothesis that p — 0. 

Revisiting our wages-productivity regression, for the original regression (12.5.2) we 
obtain E m? = 0.0214 and E &t = 0-0046. Putting these values into the g statistic given in 
Eq. (12.9.11), we obtain 


g = 


0.0046 

0.0214 


= 0.2149 


(12.9.12) 


Consulting the Durbin-Watson table for 45 observations (the number closest to 45 obser¬ 
vations) and 1 explanatory variable (Appendix D, Table D.5), we find that di = 1.288 and 
du = 1.376 (5 percent level). Since the observed g lies below the lower limit of d, we do 
not reject the hypothesis that true p — 1. Keep in mind that although we use the same 
Durbin-Watson tables, now the mdl hypothesis is that p = 1 and not that p = 0. In view of 
this finding, the results given in Eq. (12.9.9) may be acceptable. 


p Based on Durbin-Watson d Statistic 

If we cannot use the first-difference transformation because p is not sufficiently close 
to unity, we have an easy method of estimating it from the relationship between d and 
p established previously in Eq. (12.6.10), from which we can estimate p as follows: 


P 


d 

2 


(12.9.13) 


Thus, in reasonably large samples one can obtain p from Eq. (12.9.13) and use it to trans¬ 
form the data as shown in the generalized difference equation (12.9.5). Keep in mind that 
the relationship between p and d given in Eq. (12.9.13) may not hold true in small samples, 
for which Theil and Nagar have proposed a modification, which is given in Exercise 12.6. 

In our wages-productivity regression (12.5.2), we obtain a d value of 0.2176. Using this 
value in Eq. (12.9.13), we obtain p ~ 0.8912. Using this estimated p value, we can esti¬ 
mate regression (12.9.5). All we have to do is subtract 0.8912 times the previous value of Y 
from its current value and similarly subtract 0.8912 times the previous value of X from 
its current value and run the OLS regression on the variables thus transformed as in 
Eq. (12.9.6), where Y* = (Y, - 0.8912T,_i) and X* = (X t - 0.8912X,_i). 


p Estimated from the Residuals 

If the AR(1) scheme u, = pu t ~\ + e t is valid, a simple way to estimate p is to regress the 
residuals u t on u t -\, for the it, are consistent estimators of the true u t , as noted previously. 
That is, we run the following regression: 

u, = p.u t - i+v ( (12.9.14) 


41 1. I. Berenblutt and C. I. Webb, "A New Test for Autocorrelated Errors in the Linear Regression 
Model," Journal of the Royal Statistical Society, Series B, vol. 35, no.1, 1973, pp. 33-50. 
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where u, are the residuals obtained from the original (level form) regression and where v f 
are the error term of this regression. Note that there is no need to introduce the intercept 
term in Eq. (12.9.14), for we know the OLS residuals sum to zero. 

The residuals from our wages-productivity regression given in Eq. (12.5.1) are already 
shown in Table 12.5. Using these residuals, the following regression results were obtained: 


u t = 0.8678tq_i 

t = (12.7359) r 2 = 0.7863 


(12.9.15) 


As this regression shows, p = 0.8678. Using this estimate, one can transform the original 
model as per Eq. (12.9.6). Since the p estimated by this procedure is about the same as that 
obtained from the Durbin-Watson d, the regression results using the p of Eq. (12.9.15) 
should not be very different from those obtained from the p estimated from the 
Durbin-Watson d. We leave it to the reader to verify this. 


Iterative Methods of Estimating p 

All the methods of estimating p discussed previously provide us with only a single estimate 
of p. But there are the so-called iterative methods that estimate p iteratively, that is, by 
successive approximation, starting with some initial value of p. Among these methods the 
following may be mentioned: the Cochrane-Orcutt iterative procedure, the Cochrane- 
Orcutt two-step procedure, the Durbin two-step procedure, and the Hildreth-Lu 
scanning or search procedure. Of these, the most popular is the Cochran-Orcutt iterative 
method. To save space, the iterative methods are discussed by way of exercises. Remember 
that the ultimate objective of these methods is to provide an estimate of p that may be used 
to obtain GLS estimates of the parameters. One advantage of the Cochrane-Orcutt iterative 
method is that it can be used to estimate not only an AR(1) scheme, but also higher-order 
autoregressive schemes, such as u, = p\U,_\ + fcUt-i + v ( , which is AR(2). Having ob¬ 
tained the two ps, one can easily extend the generalized difference equation (12.9.6). Of 
course, the computer can now do all this. 

Returning to our wages-productivity regression, and assuming an AR(1) scheme, we 
use the Cochrane-Orcutt iterative method, which gives the following estimates of p: 
0.8876, 0.9944, and 0.8827. The last value of 0.8827 can now be used to transform the 
original model as in Eq. (12.9.6) and estimate it by OLS. Of course, OLS on the trans¬ 
formed model is simply the GLS. The results are as follows: 

Stata can estimate the coefficients of the model along with p. For example, if we assume 
the AR(1), Stata produces the following results: 

Y* = 43.1042 + 0.5712A, 

se = (4.3722) (0.0415) (12.9.16) 

t= (9.8586) (13.7638) r 2 = 0.8146 

From these results, we can see that the estimated rho (p) is —0.8827, which is not very 
much different from the p in Eq. (12.9.15). 

As noted before, in the generalized difference equation (12.9.6) we lose one observation 
because the first observation has no antecedent. To avoid losing the first observation, we 
can use the Prais-Winsten transformation. Using this transformation, and using STATA 
(version #10), we obtain the following results for our wages-productivity regression: 

Rcompb, = 32.0434 + 0.6628 Prodb, 
se m (3.7182) (0.0386) 


= 0.8799 


(12.9.17) 
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In this transformation, the p value was 0.9193, which was obtained after 13 iterations. It 
should be pointed out that if we do not transform the first observation a la Prais—Winsten 
and drop that observation, the residts sometimes are substantially different, especially in 
small samples. Notice that the p obtained here is not much different from the one obtained 
in Eq. (12.9.15). 

General Comments 

There are several points about correcting for autocorrelation using the various methods dis¬ 
cussed above. 

First, since the OLS estimators are consistent despite autocorrelation, in large samples, 
it makes little difference whether we estimate p from the Durbin-Watson d, or from the re¬ 
gression of the residuals in the current period on the residuals in the previous period, or 
from the Cochrane-Orcutt iterative procedure because they all provide consistent estimates 
of the true p. Second, the various methods discussed above are basically two-step methods. 
In step 1 we obtain an estimate of the unknown p and in step 2 we use that estimate to trans¬ 
form the variables to estimate the generalized difference equation, which is basically GLS. 
But since we use p instead of the true p, all these methods of estimation are known in the 
literature as feasible GLS (FGLS) or estimated GLS (EGLS) methods. 

Third, it is important to note that whenever we use an FGLS or EGLS method to estimate 
the parameters of the transformed model, the estimated coefficients will not necessarily have 
the usual optimum properties of the classical model, such as BLUE, especially in small 
samples. Without going into complex technicalities, it may be stated as a general principle 
that whenever we use an estimator in place of its true value, the estimated OLS coefficients 
may have the usual optimum properties asymptotically, that is, in large samples. Also, the 
conventional hypothesis testing procedures are, strictly speaking, valid asymptotically. In 
small samples, therefore, one has to be careful in interpreting the estimated results. 

Fourth, in using EGLS, if we do not include the first observation (as was originally the 
case with the Cochrane-Orcutt procedure), not only the numerical values but also the effi¬ 
ciency of the estimators can be adversely affected, especially if the sample size is small and 
if the regressors are not strictly speaking nonstochastic. 42 Therefore, in small samples it is 
important to keep the first observation a la Prais-Winsten. Of course, if the sample size is 
reasonably large, EGLS, with or without the first observation, gives similar results. Inci¬ 
dentally, in the literature EGLS with Prais-Winsten transformation is known as the full 
EGLS, or FEGLS, for short. 

12.10 The Newey-West Method of Correcting 
the OLS Standard Errors 


Instead of using the FGLS methods discussed in the previous section, we can still use 
OLS but correct the standard errors for autocorrelation by a procedure developed by 
Newey and West. 43 This is an extension of White’s heteroscedasticity-consistent standard 
errors that we discussed in the previous chapter. The corrected standard errors are known 
as HAC (heteroscedasticity- and autocorrelation-consistent) standard errors or 
simply Newey-West standard errors. We will not present the mathematics behind the 


42 This is especially so if the regressors exhibit a trend, which is quite common in economic data. 
43 W. K. Newey and K. West, "A Simple Positive Semi-Definite Heteroscedasticity and Autocorrelation 
Consistent Covariance Matrix, Econometrica, vol. 55, 1987, pp. 703-708. 
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Newey-West procedure, for it is involved. 44 But most modem computer packages now 
calculate the Newey-West standard errors. It is important to point out that the 
Newey-West procedure is strictly speaking valid in large samples and may not be appro¬ 
priate in small samples. But in large samples we now have a method that produces 
autocorrelation-corrected standard errors so that we do not have to worry about the EGLS 
transformations discussed in the previous section. Therefore, if a sample is reasonably 
large, one should use the Newey-West procedure to correct OLS standard errors not only 
in situations of autocorrelation only but also in cases of heteroscedasticity, for the HAC 
method can handle both, unlike the White method, which was designed specifically for 
heteroscedasticity. 

Once again let us return to our wages-productivity regression (12.5.1). We know that 
this regression suffers from autocorrelation. Our sample of 46 observations is reasonably 
large, so we can use the HAC procedure. Using EViews 4, we obtain the following regres¬ 
sion results: 

Y t = 32.7419 + 0.6704X, 

se= (2.9162)’ (0.0302)* (12.10.1) 

r 2 = 0.9765 d= 0.1719 


where * denotes HAC standard errors. 

Comparing this regression with Eq. (12.5.1), we find that in both the equations the esti¬ 
mated coefficients and the r 2 value are the same. But, importantly, note that the HAC stan¬ 
dard errors are much greater than the OLS standard errors and therefore the HAC t ratios 
are much smaller than the OLS t ratios. This shows that OLS had in fact underestimated the 
true standard errors. Curiously, the d statistics in both Eqs. (12.5.1) and (12.10.1) are the 
same. But don’t worry, for the HAC procedure has already taken this into account in cor¬ 
recting the OLS standard errors. 

12.11 OLS versus FGLS and HAC 


The practical problem facing the researcher is this: In the presence of autocorrelation, OLS 
estimators, although unbiased, consistent, and asymptotically normally distributed, are not 
efficient. Therefore, the usual inference procedure based on the t, F, and y 2 tests is no 
longer appropriate. On the other hand, FGLS and HAC produce estimators that are effi¬ 
cient, but the finite, or small-sample, properties of these estimators are not well docu¬ 
mented. This means in small samples the FGLS and HAC might actually do worse than 
OLS. As a matter of fact, in a Monte Carlo study Griliches and Rao 45 found that if the sam¬ 
ple is relatively small and the coefficient of autocorrelation, p, is less than 0.3, OLS is as 
good or better than FGLS. As a practical matter, then, one may use OLS in small samples 
in which the estimated p is, say, less than 0.3. Of course, what is a large and what is a small 
sample are relative questions, and one has to use some practical judgment. If you have only 
15 to 20 observations, the sample may be small, but if you have, say, 50 or more observa¬ 
tions, the sample may be reasonably large. 


^If you can handle matrix algebra, the method is discussed in Greene, op. cit, 4th ed., pp. 462-463. 
45 Z. Griliches, and P. Rao, "Small Sample Properties of Several Two-stage Regression Methods in 
the Context of Autocorrelated Errors," journal of the American Statistical Association, vol. 64, 1969, 
pp. 253-272. 
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12.12 Additional Aspects of Autocorrelation 
Dummy Variables and Autocorrelation 

In Chapter 9 we considered dummy variable regression models. In particular, recall the 
U.S. savings-income regression model for 1970-1995 that we presented in Eq. (9.5.1), 
which for convenience is reproduced below: 

Y t = a i + a 2 D t + PiX t + p 2 {D t X t ) + u t (12.12.1) 

where Y — savings 
X = income 

D — 1 for observations in period 1982-1995 
D — 0 for observations in period 1970-1981 

The regression results based on this model are given in Eq. (9.5.4). Of course, this model 
was estimated with the usual OLS assumptions. 

But now suppose that u, follows a first-order autoregressive, AR(1), scheme. That is, 
lit = pu t -i -I- s t . Ordinarily, if p is known or can be estimated by one of the methods dis¬ 
cussed above, we can use the generalized difference method to estimate the parameters of 
the model that is free from (first-order) autocorrelation. However, the presence of the 
dummy variable D poses a special problem: Note that the dummy variable simply classifies 
an observation as belonging to the first or second period. How do we transform it? One can 
follow the following procedure. 46 

1. In Eq. (12.12.1), values of D are zero for all observations in the first period; in period 
2 the value of D for the first observation is 1/(1 — p) instead of 1, and 1 for all other 
observations. 

2. The variable X, is transformed as (X t — pX t ~i). Note that we lose one observation in 
this transformation, unless one resorts to Prais-Winsten transformation for the first 
observation, as noted earlier. 

3. The value of D,X, is zero for all observations in the first period {note: D t is zero 
in the first period); in the second period the first observation takes the value of D t X, — X t 
and the remaining observations in the second period are set to (D t X t — D t pX t _\) = 
(X, — pX,_ i). {Note: the value of D t in the second period is 1.) 

As the preceding discussion points out, the critical observation is the first observation in 
the second period. If this is taken care of in the manner just suggested, there should be no 
problem in estimating regressions like Eq. (12.12.1) subject to AR(1) autocorrelation. In 
Exercise 12.37, the reader is asked to carry such a transformation for the data on U.S. sav¬ 
ings and income given in Chapter 9. 

ARCH and GARCH Models 

Just as the error term u at time t can be correlated with the error term at time {t - 1) in an 
AR(1) scheme or with various lagged error terms in a general AR(p) scheme, can there be 
autocorrelation in the variance a 2 at time t with its values lagged one or more periods? Such 
an autocorrelation has been observed by researchers engaged in forecasting financial time 
series, such as stock prices, inflation rates, and foreign exchange rates. Such autocorrelation 
is given the rather daunting names autoregressive conditional heteroscedasticity (ARCH) 
if the error variance is related to the squared error term in the previous term and generalized 
autoregressive conditional heteroscedasticity (GARCH) if the error variance is related to 


46 See Maddala, op. cit., pp. 321-322. 
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squared error terms several periods in the past. Since this topic belongs in the general area 
of time series econometrics, we will discuss it in some depth in the chapters on time series 
econometrics. Our objective here is to point out that autocorrelation is not confined to rela¬ 
tionships between current and past error terms but also with current and past error variances. 

Coexistence of Autocorrelation and Heteroscedasticity 

What happens if a regression model suffers from both heteroscedasticity and autocorrelation? 
Can we solve the problem sequentially, that is, take care of heteroscedasticity first and then 
autocorrelation? As a matter of fact, one author contends that, “Autoregression can only be 
detected after the heteroscedasticity is controlled for.” 47 But can we develop an omnipotent 
test that can solve these and other problems (e.g., model specification) simultaneously? Yes, 
such tests exist, but their discussion will take us far afield. It is better to leave them for refer¬ 
ences. 48 However, as noted earlier, we can use the HAC standard errors, for they take into ac¬ 
count both autocorrelation and heteroscedasticity, provided the sample is reasonably large. 

12.13 A Concluding Example 

In Example 10.2, we presented data on consumption, income, wealth, and interest rates for 
the U.S., all in real terms. Based on these data, we estimated the following consumption 
function for the U.S. for the period 1947-2000, regressing the log of consumption on the 
logs of income and wealth. We did not express the interest rate in the log form because 
some of the real interest rate figures were negative. 

Dependent Variable: In(CONSUMPTION) 

Method: Least Squares 
Sample: 1947-2000 
Included observations: 54 


Coefficient Std. Error t-Statistic Prob. 


'C' 

In(INCOME) 
In(WEALTH) 
INTEREST 


-0.467711 

0.804873 

0.201270 

-0.002689 


0,042778 -10.93343 0.0000 
0.017498 45.99836 0.0000 
0.017593 11.44060 0.0000 
0.000762 -3.529265 0.0009 


R-squared 

0.999560 

Mean dependent var. 

7.826093 

Adjusted R-squared 

0.999533 

S.D. dependent var. 

0.552368 

S.E. of regression 

0.011934 

E-statistic 

37832.59 

Sum squared resid. 

0.007121 

Prob. (E-statistic) 

0.000000 

Log likelihood 

164.5880 

Durbin-Watson stat. 

1.289219 


As expected, the income and wealth elasticities are positive and the interest rate semielastic¬ 
ity is negative. Although the estimated coefficients seem to he individually highly statistically 
significant, we need to check for possible autocorrelation in the error term. As we know, in the 
presence of autocorrelation, the estimated standard errors may he underestimated. Examing 


47 Lois W. Sayrs, Pooled Time Series Analysis, Sage Publications, California, 1989, p. 19. 

48 See Jeffrey M. Wooldridge, op. cit., pp. 402^403, and A. K. Bera and C. M. Jarque, "Efficient Tests 
for Normality, Homoscedasticity and Serial Independence of Regression Residuals: Monte Carlo 
Evidence," Economic Letters, vol. 7, 1981, pp. 313-318. 
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the Durbin-Watson d statistic, it seems the error terms in the consumption function suffer 
from (first-degree) autocorreiation (check this out). 

To confirm this, we estimated the consumption function, allowing for AR(1) autocorre¬ 
lation. The results are as follows: 

Dependent Variable* InCONSUMPTION 
Method: Least Squares 
Sample (adjusted): 1948-2000 
Included observations: 53 after adjustments 
Convergence achieved after 11 Iterations 


Coefficient Std. Error t-Statistic Prob. 


C -0.399833 0 
InINCOME 0.845654 0 
InWEALTH 0.159131 0 
INTEREST 0.001214 0 
AR (1) 0.612443 0 


070954 -5.635112 0.0000 
029275 28.89313 0.0000 
027462 5,794501 0.0000 
000925 1.312986 0,1954 
.100591 6.088462 0.0000 


R-squared 

0.999688 

Mean dependent var. 

7.843871 

Adjusted R-squared 

0.999662 

S.D. dependent var. 

0.541833 

S.E. of regression 

0.009954 

E-statistic 

38503.91 

Sum squared resid. 

0.004756 

Prob. (E-statistic) 

0.00000 

Log likelihood 

171.7381 

Durbin-Watson stat. 

1.874724 


These results clearly show that our regression suffers from autocorrelation. We leave it to 
the reader to remove autocorrelation using some of the transformations discussed in this 
chapter. You may use the estimated p of 0.6124 for the transformations. Below, we present the 
results based on Newey-West (HAC) standard errors that take into account autocorrelation. 

Dependent Variable: LCONSUMPTION 
Method: Least Squares 
Sample: 1947-2000 
Included observations: 54 

Newey-West HAC Standard Errors & Canvariance (lag truncation = 3) 



Coefficient 

; Std. Error 

t-Statistie 

Prob. 

C 

-0,467714 

0.043937 

-10.64516 

0.0000 

LINCOME 

0.804871 

0.017117 

47.02132 

0.0000 

LWEALTH 

0.201272 

0.015447 

13.02988 

0.0000 

INTEREST 

-0.002689 

0.000880 

-3.056306 

0.0036 

R-squared 

0, 

999560 Mean 

dependent var. 

7.826093 

Adjusted R- 

squared 8, 

999533 S.D. 

dependent var. 

C.552368 

S.E. of regression 0. 

011934 E-statietic 

37832.71 

Sum squared 

resid. 0. 

007121 Prob. 

(E-statistic) 

0,000000 



Durbi 

n-Watson stat. 

1.289237 


The major difference between the first and the last of the above regressions is that the 
standard errors of the estimated coefficients have changed substantially. Despite this, the 
estimated slope coefficients are still highly statistically significant. However, there is no 
guarantee that this will always be the case. 
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Summary and 
Conclusions 


1. If the assumption of the classical linear regression model—that the errors or distur¬ 
bances u t entering into the population regression function (PRF) are random or 
uncorrelated—is violated, the problem of serial or autocorrelation arises. 

2. Autocorrelation can arise for several reasons, such as inertia or sluggishness of 
economic time series, specification bias resulting from excluding important variables 
from the model or using incorrect functional form, the cobweb phenomenon, data mas¬ 
saging, and data transformation. As a result, it is useful to distinguish between pure 
autocorrelation and “induced” autocorrelation because of one or more factors just 
discussed. 

3. Although in the presence of autocorrelation the OLS estimators remain unbiased, con¬ 
sistent, and asymptotically normally distributed, they are no longer efficient. As a con¬ 
sequence, the usual t, F, and x 2 tests cannot be legitimately applied. Hence, remedial 
results may be called for. 

4. The remedy depends on the nature of the interdependence among the disturbances u t . 
But since the disturbances are unobservable, the common practice is to assume that 
they are generated by some mechanism. 

5. The mechanism that is commonly assumed is the Markov first-order autoregressive 
scheme, which assumes that the disturbance in the current time period is linearly re¬ 
lated to the disturbance term in the previous time period, the coefficient of autocorre¬ 
lation p providing the extent of the interdependence. This mechanism is known as the 
AR(1) scheme. 

6. If the AR(1) scheme is valid and the coefficient of autocorrelation is known, the serial 
correlation problem can be easily attacked by transforming the data following the gen¬ 
eralized difference procedure. The AR(1) scheme can be easily generalized to an 
AR(p). One can also assume a moving average (MA) mechanism or a mixture of AR 
and MA schemes, known as ARMA. This topic will be discussed in the chapters on time 
series econometrics. 

7. Even if we use an AR(1) scheme, the coefficient of autocorrelation is not known a pri¬ 
ori. We considered several methods of estimating p, such as the Durbin-Watson d, 
Theil-Nagar modified d, Cochrane-Orcutt (C-O) iterative procedure, C-0 two-step 
method, and the Durbin two-step procedure. In large samples, these methods generally 
yield similar estimates of p, although in small samples they perform differently. In 
practice, the C-0 iterative method has become quite popular. 

8. Using any of the methods just discussed, we can use the generalized difference method 
to estimate the parameters of the transformed model by OLS, which essentially 
amounts to GLS. But since we estimate p ( = p), we call the method of estimation fea¬ 
sible, or estimated, GLS, or FGLS or EGLS for short. 

9. In using EGLS, one has to be careful in dropping the first observation, for in small 
samples the inclusion or exclusion of the first observation can make a dramatic differ¬ 
ence in the results. Therefore, in small samples it is advisable to transform the first ob¬ 
servation according to the Prais-Winsten procedure. In large samples, however, it 
makes little difference if the first observation is included or not. 

10. It is very important to note that the method of EGLS has the usual optimum statistical 
properties only in large samples. In small samples, OLS may actually do better that 
EGLS, especially if p < 0.3. 

11. Instead of using EGLS, we can still use OLS but correct the standard errors for auto¬ 
correlation by the Newey-West HAC procedure. Strictly speaking, this procedure is 
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EXERCISES 


valid in large samples. One advantage of the HAC procedure is that it not only corrects 
for autocorrelation but also for heteroscedasticity, if it is present. 

12. Of course, before remediation comes detection of autocorrelation. There are formal and 
informal methods of detection. Among the informal methods, one can simply plot the 
actual or standardized residuals, or plot current residuals against past residuals. Among 
formal methods, one can use the runs test, the Durbin-Watson d test, the asymptotic 
normality test, the Berenblutt-Webb test, and the Breusch-Godfrey (BG) test. Of these, 
the most popular and routinely used is the Durbin-Watson d test. Despite its hoary past, 
this test has severe limitations. It is better to use the BG test, for it is much more general 
in that it allows for both AR and MA error structures as well as the presence of lagged 
regressand as an explanatory variable. But keep in mind that it is a large sample test. 

13. In this chapter we also discussed very briefly the detection of autocorrelation in the 
presence of dummy regressors. 


Questions 

12.1. State whether the following statements are true or false. Briefly justify your answer. 

a. When autocorrelation is present, OLS estimators are biased as well as 
inefficient. 

b. The Durbin-Watson d test assumes that the variance of the error term u t is 
homoscedastic. 

c. The first-difference transformation to eliminate autocorrelation assumes that the 
coefficient of autocorrelation p is -1. 

d. The R 2 values of two models, one involving regression in the first-difference 
form and another in the level form, are not directly comparable. 

e. A significant Durbin-Watson d does not necessarily mean there is autocorrela¬ 
tion of the first order. 

f. In the presence of autocorrelation, the conventionally computed variances and 
standard errors of forecast values are inefficient. 

g. The exclusion of an important variable(s) from a regression model may give a 
significant d value. 

h. In the AR(1) scheme, a test of the hypothesis that p — 1 can be made by the 
Berenblutt-Webb g statistic as well as the Durbin-Watson d statistic. 

i. In the regression of the first difference of Y on the first differences of X, if there 
is a constant term and a linear trend term, it means in the original model there is 
a linear as well as a quadratic trend term. 

12.2. Given a sample of 50 observations and 4 explanatory variables, what can you say 

about autocorrelation if (a ) d — 1.05? (b) d = 1.40? (c) d = 2.50? (d) d — 3.97? 

12.3. In studying the movement in the production workers’ share in the value added (i.e., 

labor’s share), the following models were considered by Gujarati:* 

Model A: Yt = + fi\t + Ut 

Model B: Y t = a 0 + ot\t + ait 2 + u t 

‘Damodar Gujarati, "Labor's Share in Manufacturing Industries," Industrial and Labor Relations Review, 
vol. 23, no. 1, October 1969, pp. 65-75. 
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where Y = labor’s share and t — time. Based on annual data for 1949-1964, the 
following results were obtained for the primary metal industry: 


Model A: 

% = 0.4529 - 

0.004 If R z = 0.5284 

d = 0.8252 



(-3.9608) 


Model B: 

ft m 0.4786 - 

0.0127f+ O.OOOSf 2 
(-3.2724) (2.7777) 




R 2 = 0.6629 

d= 1.82 


where the figures in the parentheses are t ratios. 

a. Is there serial correlation in model A? In model B? 

b. What accounts for the serial correlation? 

c. How would you distinguish between “pure” autocorrelation and specification 
bias? 

12.4. Detecting autocorrelation: von Neumann ratio test* Assuming that the residual u, 
are random drawings from normal distribution, von Neumann has shown that for 
large n, the ratio 


3 2 _ £(»,- - u ,)*/(« - 1 ) 
S 2 E(«i - u) 2 /n 


Note : u = 0 in OLS 


called the von Neumann ratio, is approximately normally distributed with mean 


4 = — 


and variance 


(n + l)(n - l) 3 


a. If n is sufficiently large, how would you use the von Neumann ratio to test for 
autocorrelation? 

b. What is the relationship between the Durbin-Watson d and the von Neumann 
ratio? 

c. The d statistic lies between 0 and 4. What are the corresponding limits for the 
von Neumann ratio? 

d. Since the ratio depends on the assumption that the u’s are random drawings from 
normal distribution, how valid is this assumption for the OLS residuals? 

e. Suppose in an application the ratio was found to be 2.88 with 100 observations. 
Test the hypothesis that there is no serial correlation in the data. 

Note: B. I. Hart has tabulated the critical values of the von Neumann ratio for 
sample sizes of up to 60 observations. 1 ' 

12.5. In a sequence of 17 residuals, 11 positive and 6 negative, the number of runs was 3. 

Is there evidence of autocorrelation? Would the answer change if there were 14 runs? 


*J. von Neumann, "Distribution of the Ratio of the Mean Square Successive Difference to the 
Variance," Annals of Mathematical Statistics, vol. 12, 1941, pp. 367-395. 
frhe table may be found in Johnston, op. cit., 3d ed., p. 559. 
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12.6. Theil-Nagar p estimate based on d statistic. Theil and Nagar have suggested that, 
in small samples, instead of estimating p as (1 - d/2), it should be estimated as 

, « 2 (1 — d/2) + k 2 

P ~ n 2 — k 2 

where n = total number of observations, d — Durbin-Watson d, and k = number 
of coefficients (including the intercept) to be estimated. 

Show that for large n, this estimate of p is equal to the one obtained by the sim¬ 
pler formula (1 — d/2). 

12.7. Estimating p: The Hildreth-Lu scanning or search procedure* Since in the first- 
order autoregressive scheme 


p is expected to lie between —1 and +1, Hildreth and Lu suggest a systematic 
“scanning” or search procedure to locate it. They recommend selecting p between 
— 1 and +1 using, say, 0.1 unit intervals and transforming the data by the generalized 
difference equation (12.6.5). Thus, one may choose p from —0.9, —0.8,..., 0.8, 
0.9. For each chosen p we run the generalized difference equation and obtain the as¬ 
sociated RSS: "Y^u 2 . Hildreth and Lu suggest choosing that p which minimizes the 
RSS (hence maximizing the R 2 ). If further refinement is needed, they suggest using 
smaller unit intervals, say, 0.01 units such as —0.99, —0.98,..., 0.90,0.91, and so on. 

a. What are the advantages of the Hildreth-Lu procedure? 

b. How does one know that the p value ultimately chosen to transform the data will, 
in fact, guarantee minimum Y w 2 ? 

12.8. Estimating p: The Cochrane—Orcutt (C-O) iterative procedure .f As an illustration 
of this procedure, consider the two-variable model: 

Y t = fii+PiX t + u t (1) 

and the AR(1) scheme 

u t = pu t -i + e t , -1 < p < 1 ( 2 ) 

Cochrane and Orcutt then recommend the following steps to estimate p. 

1. Estimate Eq. (1) by the usual OLS routine and obtain the residuals, u t . 
Incidentally, note that you can have more than one X variable in the model. 

2. Using the residuals obtained in step 1, run the following regression: 

u t = put -1 + V, ( 3 ) 

which is the empirical counterpart of Eq. (2)/ 

3. Using p obtained in Eq. (3), estimate the generalized difference equation (12.9.6). 


*G. Hildreth and J. Y. Lu, "Demand Relations with Autocorrelated Disturbances," Michigan State 
University, Agricultural Experiment Station, Tech. Bull. 276, November 1960. 
tD. Cochrane and C. H. Orcutt, "Applications of Least-Squares Regressions to Relationships 
Containing Autocorrelated Error Terms," journal of the American Statistical Association, vol. 44, 1949, 
pp 32-61. 

*Note that p = J/ QtUt-i/J/u 2 (why?). Although biased, p is a consistent estimator of the true p. 
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4. Since a priori it is not known if the p obtained from Eq. (3) is the best estimate 
of p, substitute the values of p* and obtained in step (3) in the original re¬ 
gression Eq. (1) and obtain the new residuals, say, u* as 

u* = Y t - p* - p*X t ( 4 ) 

which can be easily computed since Y t ,X t , ft*, and /3f are all known. 

5. Now estimate the following regression: 

u* = p*u*^i + w t ( 5 ) 

which is similar to Eq. (3) and thus provides the second-round estimate of p. 
Since we do not know whether this second-round estimate of p is the best estimate 
of the true p, we go into the third-round estimate, and so on. That is why the C-0 
procedure is called an iterative procedure. But how long should we go on this 
(merry-) go-round? The general recommendation is to stop carrying out iterations 
when the successive estimates of p differ by a small amount, say, by less than 0.01 or 
0.005. In our wages-productivity example, it took about three iterations before we 
stopped. 

a. Use the Cochrane-Orcutt iterative procedure to estimate p for the wages- 
productivity regression, Eq. (12.5.2). How many iterations were involved before 
you obtained the “final” estimate of p? 

b. Using the final estimate of p obtained in (a), estimate the wages-productivity re¬ 
gression, dropping the first observation as well as retaining the first observation. 
What difference you see in the results? 

c. Do you think that it is important to keep the first observation in transforming the 
data to solve the autocorrelation problem? 

12.9. Estimating p: The Cochrane-Orcutt two-step procedure. This is a shortened ver¬ 
sion of the C-0 iterative procedure. In step 1, we estimate p from the first iteration, 
that is from Eq. (3) in the preceding exercise, and in step 2 we use that estimate of 
p to run the generalized difference equation, as in Eq. (4) in the preceding exercise. 
Sometimes in practice, this two-step method gives results quite similar to those 
obtained from the more elaborate C-0 iterative procedure. 

Apply the C-0 two-step method to the illustrative wages-productivity 
regression (12.5.1) given in this chapter and compare your results with those ob¬ 
tained from the iterative method. Pay special attention to the first observation in the 
transformation. 

12.10. Estimating p: Durbin s two-step method* To explain this method, we can write the 
generalized difference equation (12.9.5) equivalently as follows: 

Y t =p x {\ - p) + p 2 X t - p 2 pXt—i + p7(_i + £, (1) 

Durbin suggests the following two-step procedure to estimate p. First, treat Eq. (1) 
as a multiple regression model, regressing Y t on X t , X t _\ , and 7_i and treat the 
estimated value of the regression coefficient of 7 f _i ( = p) as an estimate of p. 
Second, having obtained p, use it to estimate the parameters of generalized differ¬ 
ence equation (12.9.5) or its equivalent, Eq. (12.9.6). 


*J. Durbin, "Estimation of Parameters in Time-Series Regression Models," journal of the Royal Statistical 
Society, series B, vol. 22,1960, p. 139-153. 
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a. Apply the Durbin two-step method to the wages-productivity example discussed 
in this chapter and compare your results with those obtained from the 
Cochrane-Orcutt iterative procedure and the C-O two-step method. Comment 
on the “quality” of your results. 

b. If you examine Eq. (1) above, you will observe that the coefficient of 
X t _\ (= —pfn) is equal to minus 1 times the product of the coefficient of 
X, ( = /3 2) and the coefficient of Y t _\ ( = p). How would you test that coeffi¬ 
cients obey the preceding restriction? 

12.11. In measuring returns to scale in electricity supply, Nerlove used cross-sectional 
data of 145 privately owned utilities in the United States for the period 1955 and re¬ 
gressed the log of total cost on the logs of output, wage rate, price of capital, and 
price of fuel. He found that the residuals estimated from this regression exhibited 
“serial” correlation, as judged by the Durbin-Watson d. To seek a remedy, he plot¬ 
ted the estimated residuals on the log of output and obtained Figure 12.11. 

a. What does Figure 12.11 show? 

b. How can you get rid of “serial” correlation in the preceding situation? 

12.12. The residuals from a regression when plotted against time gave the scattergram in 
Figure 12.12. The encircled “extreme” residual is called an outlier. An outlier is an 
observation whose value exceeds the values of other observations in the sample by a 


FIGURE 12.11 

Regression residuals 
from the Nerlove 
study. (Adapted from 
Marc Nerlove, “Return 
to Scale in Electric 
Supply,” in Carl F. 
Christ et al., 
Measurement in 
Economics, Stanford 
University Press, 
Stanford, Calif., 1963.) 


FIGURE 12.12 

Hypothetical 
regression residuals 
plotted against time. 
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large amount, perhaps three or four standard deviations away from the mean value of 
all the observations. 

a. What are the reasons for the existence of the outlier(s)? 

b. If there is an outlier(s), should that observation(s) be discarded and the regres¬ 
sion run on the remaining observations? 

c. Is the Durbin-Watson d applicable in the presence of the outlier(s)? 

12.13. Based on the Durbin-Watson d statistic, how would you distinguish “pure” auto¬ 
correlation from specification bias? 

12.14. Suppose in the model 

J t = £1 + p 2 X t + u t 

the us are in fact serially independent. What would happen in this situation if, as¬ 
suming that u, = pu t 1 + s t , we were to use the following generalized difference 
regression? 

Y t - pY t -\ m /h(l - p) + p 2 X t - pfaXt^ + St 
Discuss in particular the properties of the disturbance term s t . 

12.15. In a study of the determination of prices of final output at factor cost in the United 
Kingdom, the following results were obtained on the basis of annual data for the 
period 1951-1969: 

PF, = 2.033 + 0.273 W,- 0.5212G+ 0.256 M,+ 0.028M,_i + 0.12LPF)_i 
se = (0.992) (0.127) (0.099) (0.024) (0.039) (0.119) 

R 2 = 0.984 d= 2.54 

where PF = prices of final output at factor cost, W — wages and salaries per employee, 
X = gross domestic product per person employed, M — import prices, M,_ 1 = 
import prices lagged 1 year, and PF,_| = prices of final output at factor cost in the 
previous year.* 

“Since for 18 observations and 5 explanatory variables, the 5 percent lower and 
upper d values are 0.71 and 2.06, the estimated d value of 2.54 indicates that there 
is no positive autocorrelation.” Comment. 

12.16. Give circumstances under which each of the following methods of estimating the 
first-order coefficient of autocorrelation p may be appropriate: 

a. First-difference regression. 

b. Moving average regression. 

c. Theil-Nagar transform. 

d. Cochrane and Orcutt iterative procedure. 

e. Hildreth-Lu scanning procedure. 

f. Durbin two-step procedure. 

12.17. Consider the model: 

Yt = M + p 2 X, + u t 

where 


U t = PlUt-l + PlUt—2 + St 


‘Source: Prices and Earnings in 1951-1969: An Econometric Assessment, Department of Employment, 
Her Majesty's Stationery Office, 1971, Table C, p. 37, Eq. 63. 
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that is, the error term follows an AR(2) scheme and s t is a white noise error term. 
Outline the steps you would take to estimate the model taking into account the 
second-order autoregression. 

12.18. Including the correction factor C, the formula for /J° LS given in Eq. (12.3.1) is 

oGls _ (1 - p 2 )x\y\ gaMBfe - px,-\){y, - pvt- 1) 

(1 - p 2 )xf + Y,"=2( X ‘ ~ PXt-l) 2 

Given this formula and Eq. (12.3.1), find the expression for the correction factor C. 

12.19. Show that estimating Eq. (12.9.5) is equivalent to estimating the GLS discussed in 
Section 12.3, excluding the first observation on Y andX 

12.20. For regression (12.9.9), the estimated residuals have the following signs, which for 
ease of exposition are bracketed. 

(++++)(-)(+++++++)(-)(++++)(—)(+)(—)(+)(—)(++)(-) 

(+)(-)(+) 

On the basis of the runs test, do you reject the null hypothesis that there is no auto¬ 
correlation in the residuals? 

*12.21. Testing for higher-order serial correlation. Suppose we have time series data on a 
quarterly basis. In regression models involving quarterly data, instead of using the 
AR(1) scheme given in Eq. (12.2.1), it may be more appropriate to assume an 
AR(4) scheme as follows: 


U, = P4U t -4 + e t 

that is, to assume that the current disturbance term is correlated with that of the 
same quarter in the previous year rather than that of the preceding quarter. 

To test the hypothesis that p4 = 0, Wallis 1 ' suggests the following modified 
Durbin-Watson d test: 


Er=5(j| ~ »'~4) 2 

e;=i« 


The testing procedure follows the usual d test routine discussed in the text. Wallis 
has prepared dn tables, which may be found in his original article. 

Suppose now we have monthly data. Could the Durbin-Watson test be 
generalized to take into account such data? If so, write down the appropriate d\2 
formula. 

12.22. Suppose you estimate the following regression: 

A In output, = fii+ fo&knL, + f) 3 AlnK t + u t 

where Y is output, L is labor input, K is capital input, and A is the first-difference 
operator. How would you interpret P\ in this model? Could it be regarded as an es¬ 
timate of technological change? Justify your answer. 


‘Optional. 

^Kenneth Wallis, "Testing for Fourth Order Autocorrelation in Quarterly Regression Equations," Economet- 
rica, vol. 40,1972, pp. 617-636. Tables of d* can also be found in J. Johnston, op. cit., 3d ed., p. 558. 
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12.23. As noted in the text, Maddala has suggested that if the Durbin-Watson d is smaller 
than R 2 , one may run the regression in the first-difference form. What is the logic 
behind this suggestion? 

12.24. Refer to Eq. (12.4.1). Assume r — 0 but p / 0. What is the effect on E(a 2 ) if (a ) 
0 < p < 1 and (b) — 1 < p < 0? When will the bias in a 2 be reasonably small? 

12.25. The residuals from the wages-productivity regression given in Eq. (12.5.2) were 
regressed on lagged residuals going back six periods (i.e., AR[6]), yielding the fol¬ 
lowing results: 


Dependent Variable: Si 

Metfcqd: Least Squares 

Sample (adjusted): 1966-2005 

Included observations: 40 after adjustments; 



Coefficient Std. Error 

t-Statistic 

Prob. 

si | M |) 

Sl . 019716 

f . 170999 

5.963275 

0.0000 

SI(-2) 

-0.029679 

0.. 244152 

-0. J2‘4560 

0.9040 

SI(-3) 

-0.286782 

0.241975 

-1.185171 

0.2442 

Si(-4) 

C. 14 9212 

0.242:076 

0.616386 

0.5417 

SI(-5$ 

-0.Off171 

0.243386 

-0.293240 

0.7711 

SI(-6) 

0.034362 

0.167077 

0.205663 

0.8383 

R-squared 


0.749857 Mean 

dependent var. 

0.004433 

Adjststed R- 

squared 

0.713f|i: S.D. 

dependent var. 

0.019843 

S.E. of regression 

0.010629 Durbi 

.n-Waston stat. 

11956818 

Sum squared 

resid. 

0.003841 




a. From the preceding results, what can you say about the nature of autocorrelation 
in the logarithmic wages-productivity data? 

b. If you think that an AR(1) mechanism characterizes autocorrelation in our data, 
would you use the first-difference transformation to get rid of autocorrelation? 
Justify your answer. 

Empirical Exercises 

12.26. Refer to the data on the copper industry given in Table 12.7. 

a. From these data estimate the following regression model: 

In Ct = fi\ + fi 2 Inf + /% \nL t + 04 In Ht + ^65 In At + iq 
Interpret the results. 

b. Obtain the residuals and standardized residuals from the preceding regression 
and plot them. What can you surmise about the presence of autocorrelation in 
these residuals? 

c. Estimate the Durbin-Watson d statistic and comment on the nature of autocor¬ 
relation present in the data. 

d. Carry out the runs test and see if your answer differs from that just given in (c). 

e. How would you find out if an AR(p) process better describes autocorrelation 
than an AR(1) process? 

Note: Save the data for further analysis. (See Exercise 12.28.) 
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TABLE 12.7 
Determinants of U.S. 
Domestic Price of 
Copper, 1951-1980 


Year 

C 

C 

/ 

L 

H 

A 

1951 

21.89 

330.2 

45.1 

220.4 

1,491.0 

19.00 

52 

22.29 

347.2 

50.9 

259.5 

1,504.0 

19.41 

53 

19.63 

366.1 

53.3 

256.3 

1,438.0 

20.93 

54 

22.85 

366.3 

53.6 

249.3 

1,551.0 

21.78 

55 

33.77 

399.3 

54.6 

352.3 

1,646.0 

23.68 

56 

39.18 

420.7 

61.1 

329.1 

1,349.0 

26.01 

57 

30.58 

442.0 

61.9 

219.6 

1,224.0 

27.52 

58 

26.30 

447.0 

57.9 

234.8 

1,382.0 

26.89 

59 

30.70 

483.0 

64.8 

237.4 

1,553.7 

26.85 

60 

32.10 

506.0 

66.2 

245.8 

1,296.1 

27.23 

61 

30.00 

523.3 

66.7 

229.2 

1,365.0 

25.46 

62 

30.80 

563.8 

72.2 

233.9 

1,492.5 

23.88 

63 

30.80 

594.7 

76.5 

234.2 

1,634.9 

22.62 

64 

32.60 

635.7 

81.7 

347.0 

1,561.0 

23.72 

65 

35.40 

688.1 

89.8 

468.1 

1,509.7 

24.50 

66 

36.60 

753.0 

97.8 

555.0 

1,195.8 

24.50 

67 

38.60 

796.3 

100.0 

418.0 

1,321.9 

24.98 

68 

42.20 

868.5 

106.3 

525.2 

1,545.4 

25.58 

69 

47.90 

935.5 

111.1 

620.7 

1,499.5 

27.18 

70 

58.20 

982.4 

107.8 

588.6 

1,469.0 

28.72 

71 

52.00 

1,063.4 

109.6 

444.4 

2,084.5 

29.00 

72 

51.20 

1,171.1 

119.7 

427.8 

2,378.5 

26.67 

73 

59.50 

1,306.6 

129.8 

727.1 

2,057.5 

25.33 

74 

77.30 

1,412.9 

129.3 

877.6 

1,352.5 

34.06 

75 

64.20 

1,528.8 

117.8 

556.6 

1,171.4 

39.79 

76 

69.60 

1,700.1 

129.8 

780.6 

1,547.6 

44.49 

77 

66.80 

1,887.2 

137.1 

750.7 

1,989.8 

51.23 

78 

66.50 

2,127.6 

145.2 

709.8 

2,023.3 

54.42 

79 

98.30 

2,628.8 

152.5 

935.7 

1,749.2 

61.01 

80 

101.40 

2,633.1 

147.1 

940.9 

1,298.5 

70.87 


Note: The data were collected by Gary R. Smith from sources such as American Metal Market, Metals Week, and U.S. 
Department of Commerce publications. 

C = 12-month average U.S. domestic price of copper (cents per pound). 

G = annual gross national product ($, billions). 

I = 12-month average index of industrial production. 

L = 12-month average London Metal Exchange price of copper (pounds sterling). 

H= number of housing starts per year (thousands of units). 

A = 12-month average price of aluminum (cents per pound). 


12.27. You are given the data in Table 12.8. 

a. Verify that Durbin-Watson d = 0.4148. 

b. Is there positive serial correlation in the disturbances? 

c. If so, estimate p by the 

i. Theil-Nagar method. 

ii. Durbin two-step procedure. 

Hi. Cochrane-Orcutt method. 

d. Use the Theil-Nagar method to transform the data and run the regression on the 
transformed data. 

e. Does the regression estimated in (d) exhibit autocorrelation? If so, how would 
you get rid of it? 
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Y, Personal Consumption 
Expenditure, Billions 


txpenaiture, billions 
of 1958 Dollars 

X, Time 

Y, Estimated Y 

u. Residuals 

281.4 

1 (= 1956) 

261.4208 

19.9791 

288.1 

2 

276.6026 

11.4973 

290.0 

3 

291.7844 

-1.7844 

307.3 

4 

306.9661 

0.3338 

316.1 

5 

322.1479 

-6.0479 

322.5 

6 

337.3297 

-14.8297 

338.4 

7 

352.5115 

-14.1115 

353.3 

8 

367.6933 

-14.3933 

373.7 

9 

382.8751 

-9.1751 

397.7 

10 

398.0569 

-0.3569 

418.1 

11 

413.2386 

4.8613 

430.1 

12 

428.4206 

1.6795 

452.7 

13 

443.6022 

9.0977 

469.1 

14 

458.7840 

10.3159 

476.9 

15 ( = 1970) 

473.9658 

2.9341 


Note: Data for ¥ obtained from the regression Y, = + fiiX, + u t . 


12.28. Refer to Exercise 12.26 and the data given in Table 12.7. If the results of this exer¬ 
cise show serial correlation, 

a. Use the Cochrane-Orcutt two-stage procedure and obtain the estimates of the 
feasible GLS or the generalized difference regression and compare your results. 

b. If the p estimated from the Cochrane-Orcutt method in (a) differs substantially 
from that estimated from the d statistic, which method of estimating p would you 
choose and why? 

12.29. Refer to Example 7.4. Omitting the variables X 2 and X 2 , rim the regression and 
examine the residuals for “serial” correlation. If serial correlation is found, how 
would you rationalize it? What remedial measures would you suggest? 

12.30. Refer to Exercise 7.21. A priori autocorrelation is expected in such data. Therefore, 
it is suggested that you regress the log of real money supply on the logs of real na¬ 
tional income and long-term interest rate in the first-difference form. Run this 
regression, and then rerun the regression in the original form. Is the assumption un¬ 
derlying the first-difference transformation satisfied? If not, what kinds of biases 
are likely to result from such a transformation? Illustrate with the data at hand. 

12.31. The use ofDurbin-Watson dfor testing nonlinearity. Continue with Exercise 12.29. 
Arrange the residuals obtained in that regression according to increasing values of 
X. Using the formula given in Eq. (12.6.5), estimate d from the rearranged residu¬ 
als. If the computed d value indicates autocorrelation, this would imply that the lin¬ 
ear model was incorrect and that the full model should include X 2 and X? terms. 
Can you give an intuitive justification for such a procedure? See if your answer 
agrees with that given by Henri Theil.* 

12.32. Refer to Exercise 11.22. Obtain the residuals and find out if there is autocorrelation 
in the residuals. How would you transform the data in case serial correlation is de¬ 
tected? What is the meaning of serial correlation in the present instance? 


'Henri Theil, Introduction to Econometrics, Prentice Hall, Englewood Cliffs, NJ, 1978, pp. B07-B08. 
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12.33. Monte Carlo experiment. Refer to Tables 12.1 and 12.2. Using s t and X, data given 

there, generate a sample of 10 7 values from the model 

7, = 3.0 + 0.5X, + u t 

where u t = 0.9w ( _i + e t . Assume uq = 10. 

a. Estimate the equation and comment on your results. 

b. Now assume uq — 17. Repeat this exercise 10 times and comment on the results. 

c. Keep the preceding setup intact except now let p = 0.3 instead of p — 0.9 and 
compare your results with those given in (b). 

12.34. Using the data given in Table 12.9, estimate the model 

Y t =P i + foX, + u t 

where 7 = inventories and X = sales, both measured in billions of dollars. 

a. Estimate the preceding regression. 

b. From the estimated residuals find out if there is positive autocorrelation using (i) the 
Durbin-Watson test and (ii) the large-sample normality test given in Eq. (12.6.13). 

c. If p is positive, apply the Berenblutt-Webb test to test the hypothesis that p — 1. 

d. If you suspect that the autoregressive error structure is of order p, use the 
Breusch-Godfrey test to verify this. How would you choose the order of pi 

e. On the basis of the results of this test, how would you transform the data to 
remove autocorrelation? Show all your calculations. 


TABLE 12.9 Inventories and Sales in U.S. Manufacturing, 1950-1991 (millions of dollars) 


Year 

Sales* 

Inventories 1 

Ratio 

Year 

Sales* 

Inventories 1 

Ratio 

1950 

46,486 

84,646 

1.82 

1971 

224,619 

369,374 

1.57 

1951 

50,229 

90,560 

1.80 

1972 

236,698 

391,212 

1.63 

1952 

53,501 

98,145 

1.83 

1973 

242,686 

405,073 

1.65 

1953 

52,805 

101,599 

1.92 

1974 

239,847 

390,950 

1.65 

1954 

55,906 

102,567 

1.83 

1975 

250,394 

382,510 

1.54 

1955 

63,027 

108,121 

1.72 

1976 

242,002 

378,762 

1.57 

1956 

72,931 

124,499 

1.71 

1977 

251,708 

379,706 

1.50 

1957 

84,790 

157,625 

1.86 

1978 

269,843 

399,970 

1.44 

1958 

86,589 

159,708 

1.84 

1979 

289,973 

424,843 

1.44 

1959 

98,797 

174,636 

1.77 

1980 

299,766 

430,518 

1.43 

1960 

11 3,201 

188,378 

1.66 

1981 

319,558 

443,622 

1.37 

1961 

126,905 

211,691 

1.67 

1982 

324,984 

449,083 

1.38 

1962 

143,936 

242,157 

1.68 

1983 

335,991 

463,563 

1.35 

1963 

154,391 

265,215 

1.72 

1984 

350,715 

481,633 

1.35 

1964 

168,129 

283,413 

1.69 

1985 

330,875 

428,108 

1.38 

1965 

163,351 

311,852 

1.95 

1986 

326,227 

423,082 

1.29 

1966 

172,547 

312,379 

1.78 

1987 

334,616 

408,226 

1.24 

1967 

190,682 

339,516 

1.73 

1988 

359,081 

439,821 

1.18 

1968 

194,538 

334,749 

1.73 

1989 

394,615 

479,106 

1.17 

1969 

194,657 

322,654 

1.68 

1990 

411,663 

509,902 

1.21 

1970 

206,326 

338,109 

1.59 






'Annual data are averages of monthly, not seasonally adjusted, figures. 

^Seasonally adjusted, end of period figures beginning 1982 are not comparable with earlier period. 
Source: Economic Report of the President, 1993, Table B-53, p. 408. 
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f Repeat the preceding steps using the following model: 

\nY t = Pi + fa lnZ’j + u t 

g. How would you decide between the linear and log-linear specifications? Show 
explicitly the test(s) you use. 

12.35. Table 12.10 gives data on real rate of return on common stocks at time (RR/), out¬ 
put growth in period (f + 1), (OG/ + i), and inflation in period t(Inf,), all in percent 
form, for the U.S. economy for the period 1954—1981. 

a. Regress RR, on inflation. 

b. Regress RR/ on OG, + i and Inf/ 

c. Comment on the two regression results in view of Eugene Fama’s observation 
that “the negative simple correlation between real stock returns and inflation is 
spurious because it is the result of two structural relationships: a positive relation 
between current real stock returns and expected output growth [measured by 
OG/ + i], and a negative relationship between expected output growth and current 
inflation.” 

d. Would you expect autocorrelation in either of the regressions in (a) and (b)l 
Why or why not? If you do, take the appropriate corrective action and present the 
revised results. 


TABLE 12.10 
Rate of Return, 
Output Growth and 
Inflation, United 
States, 1954-1981 


Observation RR 

1954 53.0 

1955 31.2 

1956 3.7 

1957 -13.8 

1958 41.7 

1959 10.5 

1960 -1.3 

1961 26.1 

1962 -10.5 

1963 21.2 

1964 15.5 

1965 10.2 

1966 -13.3 

1967 21.3 

1968 6.8 

1969 -13.5 

1970 -0.4 

1971 10.5 

1972 15.4 

1973 -22.6 

1974 -37.3 

1975 31.2 

1976 19.1 

1977 -13.1 

1978 -1.3 

1979 8.6 

1980 -22.2 

1981 -12.2 


Growth Inflation 

6.7 -0.4 

2.1 0.4 

1.8 2.9 

-0.4 3.0 

6.0 1.7 

2.1 1.5 

2.6 1.8 

5.8 0.8 

4.0 1.8 

5.3 1.6 

6.0 1.0 

6.0 2.3 

2.7 3.2 

4.6 2.7 

2.8 4.3 

-0.2 5.0 

3.4 4.4 

5.7 3.8 

5.8 3.6 

-0.6 7.9 

- 1.2 10.8 

5.4 6.0 

5.5 4.7 

5.0 5.9 

2.8 7.9 

-0.3 9.8 

2.6 10.2 

-1.9 7.3 
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12.36. The Durbin h statistic. Consider the following model of wage determination: 

Y t =p,+p 2 X t + p 3 Y t _ ] +u t 

where Y — wages = index of real compensation per hour 
X = productivity = index of output per hour. 

a. Using the data in Table 12.4, estimate the above model and interpret your results. 

b. Since the model contains lagged regressand as a regressor, the Durbin-Watson d 
is not appropriate to find out if there is serial correlation in the data. For 
such models, called autoregressive models, Durbin has developed the so-called 
h statistic to test for first-order autocorrelation, which is defined as:* 

h = pI ” , 

Y l-n[var(ft)] 

where n = sample size, var (ft) = variance of the coefficient of the lagged F,_i, 
and p = estimate of the first-order serial correlation. 

For large sample size (technically, asymptotic), Durbin has shown that, under 
the null hypothesis that p = 0, 

h ~ JV(0,1) 

that is, the h statistic follows the standard normal distribution. From the proper¬ 
ties of the normal distribution we know that the probability of \h\ > 1.96 is 
about 5 percent. Therefore, if in an application \h\ > 1.96, we can reject the null 
hypothesis that p — 0, that is, there is evidence of first-order autocorrelation in 
the autoregressive model given above. 

To apply the test, we proceed as follows: First, estimate the above model by 
OLS (don’t worry about any estimation problems at this stage). Second, note 
var(ft) in this model as well as the routinely computed d statistic. Third, using 
the d value, obtain p & (1 — d/2). It is interesting to note that although we can¬ 
not use the d value to test for serial correlation in this model, we can use it to ob¬ 
tain an estimate of p. Fourth, now compute the h statistic. Fifth, if the sample 
size is reasonably large and if the computed \h\ exceeds 1.96, we can conclude 
that there is evidence of first-order autocorrelation. Of course, you can use any 
level of significance you want. 

Apply the h test to the autoregressive wage determination model given earlier 
and draw appropriate conclusions and compare your results with those given in 
regression (12.5.1). 

12.37. Dummy variables and autocorrelation. Refer to the savings-income regression dis¬ 
cussed in Chapter 9. Using the data given in Table 9.2, and assuming an AR(1) 
scheme, reestimate the savings-income regression, taking into account autocorre¬ 
lation. Pay close attention to the transformation of the dummy variable. Compare 
your results with those presented in Chapter 9. 

12.38. Using the wages-productivity data given in Table 12.4, estimate model (12.9.8) and 
compare your results with those given in regression (12.9.9). What conclusion(s) 
do you draw? 


*J. Durbin, "Testing for Serial Correlation in Least-squares Regression When Some of the Regressors 
Are Lagged Dependent Variables," Econometrica, vol. 38, pp. 410-421. 
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Appendix 12A 


12A.1 Proof that the Error Term v t in 

Equation (12.1.11) Is Autocorrelated 


Since v t = u t — u t -\, it is easy to show that E(v t ) = E(u t — u t -\) = E(u t ) — E(u t - 1) = 0, since 
E(u) = 0, for each t. Now, var(v ( ) = var(«, — u t ~ 1) = var(«,) + var(u,_i) = 2a 2 , since the vari¬ 
ance of each u t is a 2 and the u’s are independently distributed. Hence, v t is homoscedastic. But 

cov(v ( , v ( -i| = E(v,v t -i) = E[(u, - u t -i)(u t -\ - u,- 2)] 


which is obviously nonzero. Therefore, although the u’s are not autocorrelated, the v’s are. 


12A.2 Proof of Equations (12.2.3), (12.2.4), 
and (12.2.5) 


Ut = pu ,-1 + s, 

E(u t ) = pE(u,-i) + E(e t )=0 
var(«,) m p 2 var(u,_i) + var(e r ) 


0 ) 

( 2 ) 

(3) 


because the u’s and e’s are uncorrelated. 

Since var (u t ) = var(u,_i) = cr 2 and var(e ( ) = or 


V ' dr(Ut) = 

Now multiply Eq. (1) by u t -\ and take expectations on both sides to obtain: 


(u t , u t - 1) = E{u,u, -i) = E [pu 2 _! + u t - 1£ ( ] = pE (u 2 t _ x ) 

i (why?) and that var(«,) = var(« ( _i) = 


Noting that the covariance between u t -\ and 
a 2 /(I — p 2 ), we obtain 


Continuing in this fashion, 



co y(u t ,u t -i)- p 2 

a s 


(1 - p 2 ) 


cov(m<, u,- 3 ) = p 3 

I 


(1 - p 2 ) 

and so 

on. Now the correlation coefficient is the r 

atio of covariance 


cor(« ( , ut~i) = p coi 

/(«,, U t -2) = p 2 





Chapter 


Econometric Modeling: 
Model Specification 
and Diagnostic Testing 

Applied econometrics cannot be done mechanically; it needs understanding, intuition and 
skill. 1 

... we generally drive across bridges without worrying about the soundness of their construc¬ 
tion because we are reasonably sure that someone rigorously checked their engineering princi¬ 
ples and practice. Economists must do likewise with models or else attach the warning “not 
responsible if attempted use leads to collapse.” 2 

Economists’ search for “truth” has over the years given rise to the view that economists are 
people searching in a dark room for a non-existent black cat; econometricians are regularly 
accused of finding one. 3 

One of the assumptions of the classical linear regression model (CLRM), Assumption 9, is 
that the regression model used in the analysis is “correctly” specified: If the model is not 
“correctly” specified, we encounter the problem of model specification error or model 
specification bias. In this chapter we take a close and critical look at this assumption, 
because searching for the correct model is like searching for the Holy Grail. In particular 
we examine the following questions: 

1. How does one go about finding the “correct” model? In other words, what are the 
criteria in choosing a model for empirical analysis? 

2. What types of model specification errors is one likely to encounter in practice? 

3. What are the consequences of specification errors? 

4. How does one detect specification errors? In other words, what are some of the 
diagnostic tools that one can use? 

5. Having detected specification errors, what remedies can one adopt and with what 
benefits? 

6. How does one evaluate the performance of competing models? 


'Keith Cuthbertson, Stephen G. Hall, and Mark P. Taylor, Applied Econometrics Techniques, Michigan 
University Press, 1992, p. X. 

2 David F. Hendry, Dynamic Econometrics, Oxford University Press, U.K., 1995, p. 68. 

3 Peter Kennedy, A Guide to Econometrics, 3d ed.. The MIT Press, Cambridge, Mass., 1992, p. 82. 
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The topic of model specification and evaluation is vast, and very extensive empirical 
work has been done in this area. Not only that, but there are philosophical differences on 
this topic. Although we cannot do full justice to this topic in one chapter, we hope to bring 
out some of the essential issues involved in model specification and model evaluation. 

13.1 Model Selection Criteria 


According to Hendry and Richard, a model chosen for empirical analysis should satisfy the 
following criteria: 4 

1. Be data admissible; that is, predictions made from the model must be logically 
possible. 

2. Be consistent with theory; that is, it must make good economic sense. For example, 
if Milton Friedman’s permanent income hypothesis holds, the intercept value in the 
regression of permanent consumption on permanent income is expected to be zero. 

3. Have weakly exogenous regressors; that is, the explanatory variables, or regressors, 
must be uncorrelated with the error term. It may be added that in some situations the 
exogenous regressors may be strictly exogenous. A strictly exogenous variable is indepen¬ 
dent of current, future, and past values of the error term. 

4. Exhibit parameter constancy; that is, the values of the parameters should be stable. 
Otherwise, forecasting will be difficult. As Friedman notes, “The only relevant test of 
the validity of a hypothesis [model] is comparison of its predictions with experience.” 5 In 
the absence of parameter constancy, such predictions will not he reliable. 

5. Exhibit data coherency; that is, the residuals estimated from the model must be 
purely random (technically, white noise). In other words, if the regression model is 
adequate, the residuals from this model must be white noise. If that is not the case, there 
is some specification error in the model. Shortly, we will explore the nature of specification 
error(s). 

6. Be encompassing; that is, the model should encompass or include all the rival models 
in the sense that it is capable of explaining their results. In short, other models cannot be an 
improvement over the chosen model. 

It is one thing to list criteria of a “good” model and quite another to actually develop it, 
for in practice one is likely to commit various model specification errors, which we discuss 
in the next section. 

13.2 Types of Specification Errors 

Assume that on the basis of the criteria just listed we arrive at a model that we accept as a 
good model. To be concrete, let this model be 

Yi = + faXi + &X? + faX] + u u (13.2.1) 

where Y — total cost of production and X = output. Equation (13.2.1) is the familiar text¬ 
book example of the cubic total cost function. 

4 D. F. Hendry and J. F. Richard, "The Econometric Analysis of Economic Time Series," International 
Statistical Review, vol. 51, 1983, pp. 3-33. 

s Milton Friedman, "The Methodology of Positive Economics," in Essays in Positive Economics, 

University of Chicago Press, Chicago, 1953, p. 7. 
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But suppose for some reason (say, laziness in plotting the scattergram) a researcher 
decides to use the following model: 

Wf— u\ + a 2 Xi + a 3 Xf + u 2i (13.2.2) 

Note that we have changed the notation to distinguish this model from the true model. 

Since Eq. (13.2.1) is assumed true, adopting Eq. (13.2.2) would constitute a specification 
error, the error consisting in omitting a relevant variable (Xf). Therefore, the error term 
U2i in Eq. (13.2.2) is in fact 

U2i=uu+P4Xf (13.2.3) 

We shall see shortly the importance of this relationship. 

Now suppose that another researcher uses the following model: 

Yi = + k 2 Xi + X 3 Xf + X 4 X] + X 5 Xf + u 3i (13.2.4) 

If Eq. (13.2.1) is the “truth,” Eq. (13.2.4) also constitutes a specification error, the error 
here consisting in including an unnecessary or irrelevant variable in the sense that the 
true model assumes a 5 to be zero. The new error term is in fact 

u 3i =u Xi - X 5 Xf 

(13.2.5) 

= uu since a 5 = 0 in the true model (Why?) 

Now assume that yet another researcher postulates the following model: 

In Yt = n + y 2 X i + y 3 Xf + y 4 Xf + u 4i (13.2.6) 

In relation to the true model, Eq. (13.2.6) would also constitute a specification bias, the bias 
here being the use of the wrong functional form: In Eq. (13.2.1) Y appears linearly, 
whereas in Eq. (13.2.6) it appears log-linearly. 

Finally, consider the researcher who uses the following model: 

Y* = p* + p*X* + p*X* 2 + p*X* 3 + u* (13.2.7) 

where Y* = Y t + s, and X* = X t + wi, e, and w, being the errors of measurement. What 
Eq. (13.2.7) states is that instead of using the true Y, and X, we use their proxies, Y* and 
Xf, which may contain errors of measurement. Therefore, in Eq. (13.2.7) we commit the 
errors of measurement bias. In applied work data are plagued by errors of approximations 
or errors of incomplete coverage or simply errors of omitting some observations. In the 
social sciences we often depend on secondary data and usually have no way of knowing the 
types of errors, if any, made by the primary data-collecting agency. 

Another type of specification error relates to the way the stochastic error m, (or u t ) enters 
the regression model. Consider for instance, the following bivariate regression model 
without the intercept term: 


Y i =pX i u i (13.2.8) 

where the stochastic error term enters multiplicatively with the property that In u, satisfies 
the assumptions of the CLRM, against the following model 

Yt = aXi + Ui (13.2.9) 

where the error term enters additively. Although the variables are the same in the two 
models, we have denoted the slope coefficient in Eq. (13.2.8) by P and the slope coefficient 
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in Eq. (13.2.9) by a. Now if Eq. (13.2.8) is the “correct” or “true” model, would the esti¬ 
mated a provide an unbiased estimate of the true fP That is, will E(a) = /J? If that is not 
the case, improper stochastic specification of the error term will constitute another source 
of specification error. 

A specification error that is sometimes overlooked is the interaction among the regressors, 
that is, the multiplicative effect of one or more regressors on the regressand. To illustrate, 
consider the following simplified wage function: 

In Wi — Pi + p 2 Education, + /f 3 Gender, n 3 ? 1 0') 

+ p 4 (Education) (Gender) + u U } 

In this model, the change in the relative wages with respect to education depends not only 
on education but also on the gender (gu^ucatTon = ft + #4Gender). Likewise, the change in 
relative wages with respect to gender depends not only on gender but also on education. 

To sum up, in developing an empirical model, one is likely to commit one or more of the 
following specification errors: 

1. Omission of a relevant variable(s). 

2. Inclusion of an unnecessary variable(s). 

3. Adoption of the wrong functional form. 

4. Errors of measurement. 

5. Incorrect specification of the stochastic error term. 

6. Assumption that the error term is normally distributed. 

Before turning to an examination of these specification errors in some detail, it may be 
fruitful to distinguish between model specification errors and model mis-specification 
errors. The first four types of error discussed above are essentially in the nature of model 
specification errors in that we have in mind a “true” model but somehow we do not estimate 
the correct model. In model mis-specification errors, we do not know what the true model 
is to begin with. In this context one may recall the controversy between the Keynesians and 
the monetarists. The monetarists give primacy to money in explaining changes in GDP, 
whereas the Keynesians emphasize the role of government expenditure to explain changes 
in GDP. So to speak, these are two competing models. 

In what follows, we will first consider model specification errors and then examine 
model mis-specification errors. 


13.3 Consequences of Model Specification Errors 

Whatever the sources of specification errors, what are the consequences? To keep the dis¬ 
cussion simple, we will answer this question in the context of the three-variable model and 
consider in this section the first two types of specification errors discussed earlier, namely, 
(1) underfitting a model, that is, omitting relevant variables, and (2) overfitting a model, 
that is, including unnecessary variables. Our discussion here can be easily generalized to 
more than two regressors, but with tedious algebra; 6 matrix algebra becomes almost a 
necessity once we go beyond the three-variable case. 


5 But see Exercise 13.32. 
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Underfitting a Model (Omitting a Relevant Variable) 


Suppose the true model is: 


Yi=0 1 + p 2 X n + p 3 X 3 i + Ui 


(13.3.1) 


but for some reason we fit the following model: 


Y i =a l + a 2 X 2i + Vi 


(13.3.2) 


The consequences of omitting variable A) are as follows: 

1. If the left-out, or omitted, variable X 3 is correlated with the included variable X 2 , that 
is, r 23 , the correlation coefficient between the two variables is nonzero and «i and a 2 are 
biased as well as inconsistent. That is, E(ot]) / and E(a 2 ) / p 2 , and the bias does not 
disappear as the sample size gets larger. 

2. Even if A 2 and A 3 are not correlated, a\ is biased, although a 2 is now unbiased. 

3. The disturbance variance rx 2 is incorrectly estimated. 

4. The conventionally measured variance of a 2 ( = erVX^fi) is a biased estimator of 
the variance of the true estimator p 2 . 

5. In consequence, the usual confidence interval and hypothesis-testing procedures are 
likely to give misleading conclusions about the statistical significance of the estimated 
parameters. 

6. As another consequence, the forecasts based on the incorrect model and the forecast 
(confidence) intervals will be unreliable. 

Although proofs of each of the above statements will take us far afield, 7 it is shown in 
Appendix 13A, Section 13A.1, that 


E(u 2 ) = p 2 + p 3 b 32 


(13.3.3) 


where b 3 2 is the slope in the regression of the excluded variable A 3 on the included variable 
X 2 (£>32 = X! x 3! x 2i/Xl x 2i)- As Eq. (13.3.3) shows, a 2 is biased, unless p 3 or b 32 or both 
are zero. We rule out p 3 being zero, because in that case we do not have specification error 
to begin with. The coefficient b 32 will be zero if X 2 and X 3 are uncorrelated, which is 
unlikely in most economic data. 

Generally, however, the extent of the bias will depend on the bias term p 3 b 32 . If, for in¬ 
stance, p 3 is positive (i.e., X 3 has a positive effect on Y ) and b 3 2 is positive (i.e., X 2 and X 3 
are positively correlated), a 2 , on average, will overestimate the true p 2 (i.e., positive bias). 
But this result should not be surprising, for X 2 represents not only its direct effect on 7 but 
also its indirect effect (via A3) on 7. In short, X 2 gets credit for the influence that is rightly 
attributable to X 3 , the latter being prevented from showing its effect explicitly because it is 
not “allowed” to enter the model. As a concrete example, consider the example discussed 
in Chapter 7 (Example 7.1). 


7 For an algebraic treatment, see Jan Kmenta, Elements of Econometrics, Macmillan, New York, 1971, 
pp. 391-399. Those with a matrix algebra background may want to consult J. Johnston, Econometrics 
Methods, 4th ed., McGraw-Hill, New York, 1997, pp. 119-112. 
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EXAMPLE 13.1 

Illustrative 
Example: Child 
Mortality 
Revisited 


Regressing child mortality (CM) on per capita GNP (PGNP) and the female literacy rate 
(FLR), we obtained the regression results shown in Eq. (7.6.2), giving the partial slope 
coefficient values of the two variables as -0.0056 and -2.2316, respectively. But if we 
now drop the FLR variable, we obtain the results shown in Eq. (7.7.2). If we regard 
Eq. (7.6.2) as the correct model, then Eq. (7.7.2) is a mis-specified model in that it omits 
the relevant variable FLR. Now you can see that in the correct model the coefficient of the 
PGNP variable was —0.0056, whereas in the "incorrect" model (7.7.2) it is now -0.0114. 

In absolute terms, now PGNP has a greater impact on CM as compared with the true 
model. But if we regress FLR on PGNP (regression of the excluded variable on the included 
variable), the slope coefficient in this regression (f?3 2 in terms of Eq. [13.3.3]) is 0.00256. 8 
This suggests that as PGNP increases by a unit, on average, FLR goes up by 0.00256 units. 
But if FLR goes up by these units, its effect on CM will be (-2.2316) (0.00256) = ^63632 = 
-0.00543. 

Therefore, from Eq. (13.3.3) we finally have (/S 2 + fth^) = [-0.0056+ (-2.2316) 
(0.00256)] « -0.0111, which is about the value of the PGNP coefficient obtained in the 
incorrect model (7.7.2). 9 As this example illustrates, the true impact of PGNP on CM is much 
less (-0.0056) than that suggested by the incorrect model (7.7.2), namely, (—0.0114). 


Now let us examine the variances of a 2 and 
<r 2 

var(at 2 ) = 2 


var (ft) = 


E4 


(13.3.4) 

(13.3.5) 


where VIF (a measure of collinearity) is the variance inflation factor [ = 1/(1 — 4)] 
discussed in Chapter 10 and r 2 3 is the correlation coefficient between variables X2 and ft; 
Eqs. (13.3.4) and (13.3.5) are familiar to us from Chapters 3 and 7. 

As formulas (13.3.4) and (13.3.5) are not the same, in general, var (a 2 ) will be different 
from var (ft). But we know that var (ft) is unbiased (why?). Therefore, var (a 2 ) is biased, 
thus substantiating the statement made in point 4 earlier. Since 0 < 4 < it would seem 
that in the present case var (a 2 ) < var (ft). Now we face a dilemma: Although a 2 is biased, 
its variance is smaller than the variance of the unbiased estimator ft (of course, we are rul¬ 
ing out the case where r 2 3 = 0, since in practice there is some correlation between regres¬ 
sors). So, there is a trade-off involved here. 10 

The story is not complete yet, however, for the a 2 estimated from model (13.3.2) and 
that estimated from the true model (13.3.1) are not the same because the residual sum of 
squares (RSS) of the two models as well as their degrees of freedom (df) are different. You 
may recall that we obtain an estimate of a 2 as a 2 = RSS/df, which depends on the num¬ 
ber of regressors included in the model as well as the df (= n, number of parameters 


8 The regression results are: 

FLR = 47.5971 + 0.00256PGNP 
se= (3.5553) (0.0011) r 2 = 0.0721 

9 Note that in the true model ft and ft are unbiased estimates of their true values. 

10 To bypass the trade-off between bias and efficiency, one could choose to minimize the mean square 
error (MSE), since it accounts for both bias and efficiency. On MSE, see the statistical appendix, 
Appendix A. See also Exercise 13.6. 
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estimated). Now if we add variables to the model, the RSS generally decreases (recall that 
as more variables are added to the model, the R 2 increases), but the degrees of freedom also 
decrease because more parameters are estimated. The net outcome depends on whether the 
RSS decreases sufficiently to offset the loss of degrees of freedom due to the addition of 
regressors. It is quite possible that if a regressor has a strong impact on the regressand—for 
example, it may reduce RSS more than the loss in degrees of freedom as a result of its 
addition to the model—inclusion of such variables will not only reduce the bias but will 
also increase the precision (i.e., reduce the standard errors) of the estimators. 

On the other hand, if the relevant variables have only a marginal impact on the regres¬ 
sand, and if they are highly correlated (i.e., VIF is larger), we may reduce the bias in the 
coefficients of the variables already included in the model, but increase their standard 
errors (i.e., make them less efficient). Indeed, the trade-off in this situation between bias and 
precision can be substantial. As you can see from this discussion, the trade-off will depend 
on the relative importance of the various regressors. 

To conclude this discussion, let us consider the special case where r 2 3 = 0, that is, X 2 
and X 3 are uncorrelated. This will result in 632 being zero (why?). Therefore, it can be seen 
from Eq. (13.3.3) that a 2 is now unbiased. 11 Also, it seems from Eqs. (13.3.4) and (13.3.5) 
that the variances of a 2 and are the same. Is there no harm in dropping the variable X 3 
from the model even though it may be relevant theoretically? The answer generally is no, 
for in this case, as noted earlier, var {a2) estimated from Eq. (13.3.4) is still biased and 
therefore our hypothesis-testing procedures are likely to remain suspect. 12 Besides, in most 
economic research X 2 and X 3 will be correlated, thus creating the problems discussed 
previously. The point is clear: Once a model is formulated on the basis of the relevant 
theory, one is ill-advised to drop a variable from such a model. 

Inclusion of an Irrelevant Variable (Overfitting a Model) 

Now let us assume that 


Yi~P\ + p 2 X 2i + Ui (13.3.6) 

is the truth, but we fit the following model: 

7 i =a 1 + a 2 X 2i + 0 : 3 X 3 ,' + V,- (13.3.7) 

and thus commit the specification error of including an unnecessary variable in the model. 
The consequences of this specification error are as follows: 

1. The OLS estimators of the parameters of the “incorrect” model are all unbiased and 

consistent, that is, E{a\) = P\, E(a 2 ) = fi 2 , and E(a 3) = = 0. 

2. The error variance a 2 is correctly estimated. 

3. The usual confidence interval and hypothesis-testing procedures remain valid. 

4. However, the estimated a’s will be generally inefficient, that is, their variances will 
be generally larger than those of the /i’s of the true model. The proofs of some of these 
statements can be found in Appendix 13A, Section 13A.2. The point of interest here is 
the relative inefficiency of the a’s. This can be shown easily. 

"Note, though, ai is still biased, which can be seen intuitively as follows: We know that 

j8i = f — (62X2 — £3X3, whereas ai = ? — a 2 X 2 , and even if a 2 = j} 2 , the two intercept estimators 

will not be the same. 

"For details, see Adrian C. Darnell, A Dictionary of Econometrics, Edward Elgar Publisher, 1994, 
pp. 371-372. 
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From the usual OLS formula we know that 


and 


" £40 - 


Therefore, 


var (a 2 ) _ 1 

var(j8 2 ) 1 ~ r h 


(13.3.8) 


(13.3.9) 


(13.3.10) 


Since 0 < r| 3 < 1, it follows that var (0:2) > var(/J 2 ); that is, the variance of a 2 is gener¬ 
ally greater than the variance of jij even though, on average, a 2 = fi 2 [i.e., £ , (a 2 ) = /J 2 ], 
The implication of this finding is that the inclusion of the unnecessary variable^ makes the 
variance of a 2 larger than necessary, thereby making a 2 less precise. This is also true of a\. 

Notice the asymmetry in the two types of specification biases we have considered. If 
we exclude a relevant variable, the coefficients of the variables retained in the model are 
generally biased as well as inconsistent, the error variance is incorrectly estimated, and the 
usual hypothesis-testing procedures become invalid. On the other hand, including an irrel¬ 
evant variable in the model still gives us unbiased and consistent estimates of the coeffi¬ 
cients in the true model, the error variance is correctly estimated, and the conventional 
hypothesis-testing methods are still valid; the only penalty we pay for the inclusion of the 
superfluous variable is that the estimated variances of the coefficients are larger, and as a 
result our probability inferences about the parameters are less precise. An unwanted con¬ 
clusion here would be that it is better to include irrelevant variables than to omit the rele¬ 
vant ones. But this philosophy is not to be espoused because the addition of unnecessary 
variables will lead to a loss in the efficiency of the estimators and may also lead to the prob¬ 
lem of multicollinearity (why?), not to mention the loss of degrees of freedom. Therefore, 

In general, the best approach is to include only explanatory variables that, on theoretical 
grounds, directly influence the dependent variable and that are not accounted for by other 
included variables. 13 


13.4 Tests of Specification Errors 

Knowing the consequences of specification errors is one thing but finding out whether 
one has committed such errors is quite another, for we do not deliberately set out to com¬ 
mit such errors. Very often specification biases arise inadvertently, perhaps from our 
inability to formulate the model as precisely as possible because the underlying theory is 
weak or because we do not have the right kind of data to test the model. As Davidson 
notes, “Because of the non-experimental nature of economics, we are never sure how the 
observed data were generated. The test of any hypothesis in economics always turns out 
to depend on additional assumptions necessary to specify a reasonably parsimonious 
model, which may or may not be justified.” 14 


13 Michael D. Intriligator, Econometric Models, Techniques and Applications, Prentice Hall, Englewood 
Cliffs, NJ, 1978, p. 189. Recall the Occam's razor principle. 

14 ]ames Davidson, Econometric Theory, Blackwell Publishers, Oxford, U.K., 2000, p. 153. 
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The practical question then is not why specification errors are made, for they generally 
are, but how to detect them. Once it is found that specification errors have been made, the 
remedies often suggest themselves. If, for example, it can be shown that a variable is inap¬ 
propriately omitted from a model, the obvious remedy is to include that variable in the 
analysis, assuming, of course, the data on that variable are available. 

In this section we discuss some tests that one may use to detect specification errors. 

Detecting the Presence of Unnecessary Variables 
(Overfitting a Model) 

Suppose we develop a ^-variable model to explain a phenomenon: 

Yi = fr + fhX 2i + • ■ • + p k X u + Ui (13.4.1) 

However, we are not totally sure that, say, the variable really belongs in the model. One 
simple way to find this out is to test the significance of the estimated fik with the usual t test: 
t = fik/se (Pk)- But suppose that we are not sure whether, say, X 2 and X4 legitimately 
belong in the model. This can be easily ascertained by the F test discussed in Chapter 8. 
Thus, detecting the presence of an irrelevant variable (or variables) is not a difficult task. 

It is, however, very important to remember that in carrying out these tests of significance 
we have a specific model in mind. We accept that model as the maintained hypothesis or 
the “truth,” however tentative it may be. Given that model, then, we can find out whether 
one or more regressors are really relevant by the usual t and F tests. But note carefully that 
we should not use the t and F tests to build a model iteratively, that is, we should not say 
that initially Y is related to X 2 only because fi 2 is statistically significant and then expand 
the model to include X 2 and decide to keep that variable in the model if $3 turns out to be 
statistically significant, and so on. This strategy of building a model is called the bottom- 
up approach (starting with a smaller model and expanding it as one goes along) or by 
the somewhat pejorative term, data mining (other names are regression fishing, data 
grubbing, data snooping, and number crunching). 

The primary objective of data mining is to develop the “best” model after several diag¬ 
nostic tests so that the model finally chosen is a “good” model in the sense that all the 
estimated coefficients have the “right” signs, they are statistically significant on the basis of 
the t and F tests, the R 2 value is reasonably high, and the Durbin-Watson d has acceptable 
value (around 2), etc. The purists in the profession look down on the practice of data 
mining. In the words of William Pool, “. . . making an empirical regularity the foundation, 
rather than an implication of economic theory, is always dangerous.” 15 One reason for 
“condemning” data mining is as follows. 

Nominal versus True Level of Significance in the Presence of Data Mining 
A danger of data mining that the unwary researcher faces is that the conventional levels of 
significance (a) such as 1, 5, or 10 percent are not the true levels of significance. Lovell has 
suggested that if there are c candidate regressors out of which k are finally selected (k < c) 
on the basis of data mining, then the true level of significance (a*) is related to the nominal 
level of significance (a) as follows: 16 

a* = 1 — (1 — afP (13.4.2) 

15 William Pool, "Is Inflation Too Low?" the Cato Journal, vol. 18, no. 3, Winter 1999, p. 456. 

16 M. Lovell, "Data Mining," Review of Economics and Statistics, vol. 65, 1983, pp. 1-12. 


476 Part Two Relaxing the Assumptions of the Classical Model 


or approximately as 

a*^(c/k)a (13.4.3) 

For example, if c — 15, k — 5, and a — 5 percent, from Eq. (13.4.3) the true level of 
significance is (15/5)(5) =15 percent. Therefore, if a researcher data-mines and selects 
5 out of 15 regressors and reports only the results of the condensed model at the nominal 
5 percent level of significance and declares that the results are statistically significant, one 
should take this conclusion with a big grain of salt, for we know the (true) level of signifi¬ 
cance is in fact 15 percent. It should be noted that if c = k, that is, there is no data mining, 
the true and nominal levels of significance are the same. Of course, in practice most 
researchers report only the results of their “final” regression without necessarily telling 
about all the data mining, or pretesting, that has gone before. 17 

Despite some of its obvious drawbacks, there is increasing recognition, especially 
among applied econometricians, that the purist (i.e., non-data mining) approach to model 
building is not tenable. As Zaman notes: 

Unfortunately, experience with real data sets shows that such a [purist approach] is neither fea¬ 
sible nor desirable. It is not feasible because it is a rare economic theory which leads to a 
unique model. It is not desirable because a crucial aspect of learning from the data is learning 
what types of models are and are not supported by data. Even if, by rare luck, the initial model 
shows a good fit, it is frequently important to explore and leam the types of the models the data 
does or does not agree with . 18 

A similar view is expressed by Kerry Patterson, who maintains that: 

This [data mining] approach suggests that economic theory and empirical specification 
[should] interact rather than be kept in separate compartments . 19 

Instead of getting caught in the data mining versus the purist approach to model-building 
controversy, one can endorse the view expressed by Peter Kennedy: 

[that model specification] needs to be a well-thought-out combination of theory and data, and 
that testing procedures used in specification searches should be designed to minimize the costs 
of data mining. Examples of such procedures are setting aside data for out-of-sample predic¬ 
tion tests, adjusting significance levels [a la Lovell], and avoiding questionable criteria such as 
maximizing R 2 . 20 

If we look at data mining in a broader perspective as a process of discovering empiri¬ 
cal regularities that might suggest errors and/or omissions in (existing) theoretical mod¬ 
els, it has a very useful role to play. To quote Kennedy again, “The art of the applied 
econometrician is to allow for data-driven theory while avoiding the considerable dangers 
in data mining.” 21 


17 For a detailed discussion of pretesting and the biases it can lead to, see T. D. Wallace, "Pretest 
Estimation in Regression: A Survey," American journal of Agricultural Economics, vol. 59, 1977, 
pp. 431^143. 

18 Asad Zaman, Statistical Foundations for Econometric Techniques, Academic Press, New York, 1996, 

p. 226. 

19 Kerry Patterson, An Introduction to Applied Econometrics, St. Martin's Press, New York, 2000, p. 10. 
20 Peter Kennedy, "Sinning in the Basement: What Are the Rules? The Ten Commandments of Applied 
Econometrics," unpublished manuscript. 

21 Kennedy, op. cit., p. 13. 
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Tests for Omitted Variables and Incorrect Functional Form 

In practice we are never sure that the model adopted for empirical testing is “the truth, the 
whole truth and nothing but the truth.” On the basis of theory or introspection and prior 
empirical work, we develop a model that we believe captures the essence of the subject 
under study. We then subject the model to empirical testing. After we obtain the results, we 
begin the post-mortem, keeping in mind the criteria of a good model discussed earlier. It 
is at this stage that we come to know if the chosen model is adequate. In determining model 
adequacy, we look at some broad features of the results, such as the R 2 value, the estimated 
t ratios, the signs of the estimated coefficients in relation to their prior expectations, the 
Durbin-Watson statistic, and the like. If these diagnostics are reasonably good, we pro¬ 
claim that the chosen model is a fair representation of reality. By the same token, if the 
results do not look encouraging because the R 2 value is too low or because very few coef¬ 
ficients are statistically significant or have the correct signs or because the Durbin-Watson 
d is too low, then we begin to worry about model adequacy and look for remedies: Maybe 
we have omitted an important variable, or have used the wrong functional form, or have not 
first-differenced the time series (to remove serial correlation), and so on. To aid us in 
determining whether model inadequacy is on account of one or more of these problems, we 
can use some of the following methods. 

Examination of Residuals 

As noted in Chapter 12, examination of the residuals is a good visual diagnostic to detect 
autocorrelation or heteroscedasticity. But these residuals can also be examined, especially 
in cross-sectional data, for model specification errors, such as omission of an important 
variable or incorrect functional form. If in fact there are such errors, a plot of the residuals 
will exhibit distinct patterns. 

To illustrate, let us reconsider the cubic total cost of production function first considered 
in Chapter 7. Assume that the true total cost function is described as follows, where Y = total 
cost and A = output: 


Yi=p i + hXt + + fa*} + «t (13.4.4) 

but a researcher fits the following quadratic function: 

Y i= a i + a 2 Xi + a 3 X 2 + u 2i (13.4.5) 

and another researcher fits the following linear function: 

Yi = A.i + X 2 Xi + u 3i (13.4.6) 

Although we know that both researchers have made specification errors, for pedagogical 
purposes let us see how the estimated residuals look in the three models. (The cost-output 
data are given in Table 7.4.) Figure 13.1 speaks for itself: As we move from left to right, that 
is, as we approach the truth, not only are the residuals smaller (in absolute value) but also 
they do not exhibit the pronounced cyclical swings associated with the misfitted models. 

The utility of examining the residual plot is thus clear: If there are specification errors, 
the residuals will exhibit noticeable patterns. 

The Durbin-Watson d Statistic Once Again 

If we examine the routinely calculated Durbin-Watson d in Table 13.1, we see that for the 
linear cost function the estimated d is 0.716, suggesting that there is positive “correlation” 
in the estimated residuals: for n — 10 and k' = 1, the 5 percent critical d values are 
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FIGURE 13.1 

Residuals u, from (a) 
linear, ( b ) quadratic, 
and (c) cubic total cost 
functions. 



TABLE 13.1 

Estimated Residuals 
from the Linear, 
Quadratic, and Cubic 
Total Cost Functions 


Observation u■„ 

Number Linear Model* 

1 6.600 

2 19.667 

3 13.733 

4 -2.200 

5 -9.133 

6 -26.067 

7 -32.000 

8 -28.933 

9 4.133 

10 54.200 


u,, 

Quadratic Model* 

-23.900 
9.500 
18.817 
13.050 
11.200 
-5.733 
-16.750 
-23.850 
-6.033 
23.700 


u,, 

Cubic Model** 

-0.222 

1.607 

-0.915 

-4.426 

4.435 

1.032 

0.726 

-4.119 

1.859 

0.022 


*%= 166.467 + 


(19.021) 

(8.752) 


19.933X* 

(-'■0661 

(6.502) 


8.0250X* 

(9.809) 

(-0.818) 


**?,= 141.767 + 


(6.375) 

(22.238) 


63.47825* 
(4.778) 
(13.285) 


2.542X* 2 
(0.869) 
(2.925) 
12.962X7 + 


R 2 = 0.8409 
R 2 = 0.8210 
d= 0.716 
R 2 = 0.9284 
R 2 = 0.9079 
d= 1.038 
R 2 = 0.9983 
R 2 = 0.9975 
d= 2.70 


0.939X7 
(0.9856) (0.0592) 

-13.151) (15.861) 


di = 0.879 and du — 1.320. Likewise, the computed d value for the quadratic cost function 
is 1.038, whereas the 5 percent critical values are di = 0.697 and du= 1.641, indicating 
indecision. But if we use the modified d test (see Chapter 12), we can say that there is 
positive “correlation” in the residuals, for the computed d is less than du- For the cubic cost 
function, the true specification, the estimated d value does not indicate any positive “corre¬ 
lation” in the residuals. 22 

The observed positive “correlation” in the residuals when we fit the linear or quadratic 
model is not a measure of (first-order) serial correlation but of (model) specification 

22 ln the present context, a value of d= 2 will mean no specification error. (Why?) 
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error(s). The observed correlation simply reflects the fact that some variable(s) that belongs 
in the model is included in the error term and needs to be culled out from it and introduced 
in its own right as an explanatory variable: If we exclude the X 3 from the cost function, 
then as Eq. (13.2.3) shows, the error term in the mis-specified model (13.2.2) is in fact 
(u\j + fi>4 X 3 ) and it will exhibit a systematic pattern (e.g., positive autocorrelation) if X 3 in 
fact affects Y significantly. 

To use the Durbin-Watson test for detecting model specification error(s), we proceed as 
follows: 

1. From the assumed model, obtain the ordinary least squares (OLS) residuals. 

2. If it is believed that the assumed model is mis-specified because it excludes a relevant 
explanatory variable, say, Z from the model, order the residuals obtained in Step 1 accord¬ 
ing to increasing values of Z. Note: The Z variable could be one of the X variables included 
in the assumed model or it could be some function of that variable, such as X 2 or X 3 . 

3. Compute the d statistic from the residuals thus ordered by the usual d formula, 
namely, 

, _ E"=2(«r - U,-lf 

Note: The subscript t is the index of observation here and does not necessarily mean that 
the data are time series. 

4. From the Durbin-Watson tables, if the estimated d value is significant, then one can 
accept the hypothesis of model mis-specification. If that turns out to be the case, the reme¬ 
dial measures will naturally suggest themselves. 

In our cost example, the Z (= X) variable (output) was already ordered. 23 Therefore, 
we do not have to compute the d statistic afresh. As we have seen, the d statistic for both the 
linear and quadratic cost functions suggests specification errors. The remedies are clear: In¬ 
troduce the quadratic and cubic terms in the linear cost function and the cubic term in the 
quadratic cost function. In short, run the cubic cost model. 

Ramsey s RESET Test 

Ramsey has proposed a general test of specification error called RESET (regression speci¬ 
fication error test). 24 Here we will illustrate only the simplest version of the test. To fix 
ideas, let us continue with our cost-output example and assume that the cost function is 
linear in output as 

Y l =X 1 +X 2 X i +u 3 i ( 13 . 4 . 6 ) 

where Y = total cost and X = output. Now if we plot the residuals u, obtained from this 
regression against Y t , the estimated Y, from this model, we get the picture shown in Fig¬ 
ure 13.2. Although J2 and J2 Y i are necessarily zero (why? see Chapter 3), the residu¬ 
als in this figure show a pattern in which their mean changes systematically with %. This 
would suggest that if we introduce Y t in some form as a regressor(s) in Eq. (13.4.6), it 
should increase R 2 . And if the increase in R 2 is statistically significant (on the basis of the 
F test discussed in Chapter 8), it would suggest that the linear cost function (13.4.6) was 


23 lt does not matter if we order u, according to Xf or Xf since these are functions of X„ which is 
already ordered. 

24 J. B. Ramsey, "Tests for Specification Errors in Classical Linear Least Squares Regression Analysis," 
journal of the Royal Statistical Society, series B, vol. 31, 1969, pp. 350-371. 
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FIGURE 13.2 a. 

Residuals «, and 
estimated Y from the 

linear cost function: f 

Y t =X j + X 2 Xi + Uf. j 


- ■%- - -Jr jj , 
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mis-specified. This is essentially the idea behind RESET. The steps involved in RESET are 
as follows: 

1. From the chosen model, e.g., Eq. (13.4.6), obtain the estimated F„ that is, Y . 

2. Rerun Eq. (13.4.6) introducing Y t in some form as an additional regressor(s). From 
Figure 13.2, we observe that there is a curvilinear relationship between u, and %, suggest¬ 
ing that one can introduce Y 2 and Yf as additional regressors. Thus, we run 

Y i= p i + foX t + /3 3 Y 2 + faY? + Ui (13.4.7) 

3. Let the R 2 obtained from Eq. (13.4.7) be R 2 new and that obtained from Eq. (13.4.6) be 
Ron -Then we can use the F test first introduced in Eq. (8.4.18), namely, 

p ( R 2 „ cw — 7?oid) / num ber of new regressors (8 4 18 ) 

(1 - 7?new) / ( n ~ number of parameters in the new model) 

to find out if the increase in R 2 from using Eq. (13.4.7) is statistically significant. 

4. If the computed F value is significant, say, at the 5 percent level, one can accept the 
hypothesis that the model (13.4.6) is mis-specified. 

Returning to our illustrative example, we have the following results (standard errors in 
parentheses): 

% = 166.467 + 19.933^ (13.4.8) 

(19.021) (3.066) R 2 = 0.8409 
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Y i= 2140.7223 + 476.6557A) - 0.09187Y, 2 + O.OOOl^ 3 

(132.0044) (33.3951) (0.00620) (0.0000074) (13.4.9) 

R 2 = 0.9983 

Note: Y 2 and Yf in Eq. (13.4.9) are obtained from Eq. (13.4.8). 

Now applying the F test we find 

(0.9983 - 0.8409)/2 

~~ (1 — 0.9983)/(10 — 4) (13.4.10) 

= 284.4035 

The reader can easily verify that this F value is highly significant, indicating that the model 
(13.4.8) is mis-specified. Of course, we have reached the same conclusion on the basis of 
the visual examination of the residuals as well as the Durbin-Watson d value. It should be 
added that, since Y t is estimated, it is a random variable and, therefore, the usual tests of 
significance apply if the sample is reasonably large. 

One advantage of RESET is that it is easy to apply, for it does not require one to specify 
what the alternative model is. But that is also its disadvantage because knowing that a 
model is mis-specified does not help us necessarily in choosing a better alternative. 

As one author notes: 

In practice, the RESET test may not be particularly good at detecting any specific alternative 
to a proposed model, and its usefulness lies in acting as a general indicator that something is 
wrong. For this reason, a test such as RESET is sometimes described as a test of misspecifica- 
tion, as opposed to a test of specification. This distinction is rather subtle, but the basic idea is 
that a specification test looks at some particular aspect of a given equation, with clear null and 
alternative hypotheses in mind. A misspecification test, on the other hand, can detect a range of 
alternatives and indicate that something is wrong under the null, without necessarily giving 
clear guidance as to what alternative hypothesis is appropriate. 25 

Lagrange Multiplier (LM) Test for Adding Variables 

This is an alternative to Ramsey’s RESET test. To illustrate this test, we will continue with 
the preceding illustrative example. 

If we compare the linear cost function (13.4.6) with the cubic cost function (13.4.4), the 
former is a restricted version of the latter (recall our discussion of restricted least squares 
from Chapter 8). The restricted regression (13.4.6) assumes that the coefficients of the 
squared and cubed output terms are equal to zero. To test this, the LM test proceeds as 
follows: 

1. Estimate the restricted regression (13.4.6) by OLS and obtain the residuals, w,. 

2. If in fact the unrestricted regression (13.4.4) is the true regression, the residuals 
obtained in Eq. (13.4.6) should be related to the squared and cubed output terms, that is, X 2 
and Xf. 

3. This suggests that we regress the u, obtained in Step 1 on all the regressors (includ¬ 
ing those in the restricted regression), which in the present case means 

in = q?i + a 2 X t + a 3 X 2 + a 4 X] + v, (13.4.11) 

where v is an error term with the usual properties. 

25 ]on Stewart and Len Gill, Econometrics, 2d ed., Prentice-Hall Europe, 1998, p. 69. 
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4. For large-sample size, Engle has shown that n (the sample size) times the R 2 esti¬ 
mated from the (auxiliary) regression (13.4.11) follows the chi-square distribution with df 
equal to the number of restrictions imposed by the restricted regression, two in the present 
example since the terms X 2 and Xf are dropped from the model. 26 Symbolically, we write 

n ^ ^ ^(number of restrictions) (13.4.12) 

where asy means asymptotically, that is, in large samples. 

5. If the chi-square value obtained from Eq. (13.4.12) exceeds the critical chi-square 
value at the chosen level of significance, we reject the restricted regression. Otherwise, we 
do not reject it. 

For our example, the regression results are as follows: 

% = 166.467 + 19.333X, (13.4.13) 

where Y is total cost and X is output. The standard errors for this regression are already 
given in Table 13.1. 

When the residuals from Eq. (13.4.13) are regressed as just suggested in Step 3, we ob¬ 
tain the following results: 

% = -24.7 + 43.5443.Aj - 12.9615X 2 + 0.9396X 2 

se = (6.375) (4.779) (0.986) (0.059) (13.4.14) 

R 2 = 0.9896 

Although our sample size of 10 is by no means large, just to illustrate the LM mechanism, 
we obtain nR 2 — (10)(0.9896) = 9.896. From the chi-square table we observe that for 2 df 
the 1 percent critical chi-square value is about 9.21. Therefore, the observed value of 9.896 
is significant at the 1 percent level, and our conclusion would be to reject the restricted re¬ 
gression (i.e., the linear cost function). We reached a similar conclusion on the basis of 
Ramsey’s RESET test. 

13.5 Errors of Measurement 


All along we have assumed implicitly that the dependent variable Y and the explanatory 
variables, the ATs, are measured without any errors. Thus, in the regression of consumption 
expenditure on income and wealth of households, we assume that the data on these vari¬ 
ables are “accurate”; they are not guess estimates, extrapolated, interpolated, or rounded off 
in any systematic manner, such as to the nearest hundredth dollar, and so on. Unfortunately, 
this ideal is not met in practice for a variety of reasons, such as nonresponse errors, report¬ 
ing errors, and computing errors. Whatever the reasons, error of measurement is a poten¬ 
tially troublesome problem, for it constitutes yet another example of specification bias with 
the consequences noted below. 

Errors of Measurement in the Dependent Variable Y 

Consider the following model: 

Y* = a + pXi + Ui (13.5.1) 


26 R. F. Engle, "A General Approach to Lagrangian Multiplier Model Diagnostics," lournal of 
Econometrics, vol. 20, 1982, pp. 83-104. 
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where Y* = permanent consumption expenditure 27 
X t = current income 
Ui — stochastic disturbance term 

Since Y* is not directly measurable, we may use an observable expenditure variable Y t such that 

Y i = Y*+s i (13.5.2) 

where e, denote errors of measurement in Y*. Therefore, instead of estimating Eq. (13.5.1), 
we estimate 


Yj =(a + pXi + Ui ) + Si 

-ot + pXi + iui+Si) (13.5.3) 

= « + pX t + Vi 


where v, = m, + e, is a composite error term, containing the population disturbance term 
(which may be called the equation error term) and the measurement error term. 

For simplicity assume that E(up = E{ep = 0, cov (X„ u,) = 0 (which is the assumption 
of the classical linear regression), and cov (X„ e,) = 0; that is, the errors of measurement in 
Y* are uncorrelated with X„ and cov (w„ e,) = 0; that is, the equation error and the mea¬ 
surement error are uncorrelated. With these assumptions, it can be seen that ft estimated 
from either Eq. (13.5.1) or Eq. (13.5.3) will be an unbiased estimator of the true ft (see 
Exercise 13.7); that is, the errors of measurement in the dependent variable Y do not destroy 
the unbiasedness property of the OLS estimators. However, the variances and standard 
errors of ft estimated from Eqs. (13.5.1) and (13.5.3) will be different because, employing 
the usual formulas (see Chapter 3), we obtain 


Model (13.5.1): 


var (P) 


Model (13.5.3): 



(13.5.4) 


(13.5.5) 


Obviously, the latter variance is larger than the former. 28 Therefore, although the errors of 
measurement in the dependent variable still give unbiased estimates of the parame¬ 
ters and their variances, the estimated variances are now larger than in the case where 
there are no such errors of measurement. 

Errors of Measurement in the Explanatory Variable X 

Now assume that instead of Eq. (13.5.1), we have the following model: 

Yi=a + pX* + Ui (13.5.6) 

where Y t = current consumption expenditure 
X* = permanent income 
Ui = disturbance term (equation error) 


27 This phrase is due to Milton Friedman. See also Exercise 13.8. 

28 But note that this variance is still unbiased because under the stated conditions the composite error 
term v, = u, + e ,■ still satisfies the assumptions underlying the method of least squares. 
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Suppose instead of observing X*, we observe 

Xi=X* + Wi (13.5.7) 

where w, represents errors of measurement in X*. Therefore, instead of estimating 
Eq. (13.5.6), we estimate 


Yj = a + P(Xi -Wi) + u t 

= a + pXi + (t itj - pwi ) (13.5.8) 

-a + pXi+zt 


where z, = u, — Pw„ a compound of equation and measurement errors. 

Now even if we assume that w, has zero mean, is serially independent, and is uncorre¬ 
lated with Uj, we can no longer assume that the composite error term z, is independent of 
the explanatory variable A( because (assuming E\z~\ = 0) 

cov( Zi , Xi) = E[zj - E(zi)][Xi - E(X,)\ 

= E( Ui - p Wi )(wi) using (13.5.7) 

= E(-pwf) (13.5.9) 

= -Pol 


Thus, the explanatory variable and the error term in Eq. (13.5.8) are correlated, which vio¬ 
lates the crucial assumption of the classical linear regression model that the explanatory 
variable is uncorrelated with the stochastic disturbance term. If this assumption is violated, 
it can be shown that the OLS estimators are not only biased but also inconsistent, that is, 
they remain biased even if the sample size n increases indefinitely , 29 

For model (13.5.8), it is shown in Appendix 13A, Section 13A.3 that 

plimp = p\ — 1 1 (13.5.10) 

|_ t + a v> l a x* J 

where cr 2 and are variances of w, and A*, respectively, and where plim P means the 
probability limit of p. 

Since the term inside the brackets is expected to be less than 1 (why?), Eq. (13.5.10) 
shows that even if the sample size increases indefinitely, P will not converge to p. Actually, 
if P is assumed positive, P will underestimate P, that is, it is biased toward zero. Of course, 
if there are no measurement errors in X (i.e., cr 2 = 0), P will provide a consistent estimator 
of p. 

Therefore, measurement errors pose a serious problem when they are present in the 
explanatory variahle(s) because they make consistent estimation of the parameters impos¬ 
sible. Of course, as we saw, if they are present only in the dependent variable, the estimators 
remain unbiased and hence they are consistent too. If errors of measurement are present in 
the explanatory variahle(s), what is the solution? The answer is not easy. At one extreme, 
we can assume that if cr 2 is small compared to for all practical purposes we can 
“assume away” the problem and proceed with the usual OLS estimation. Of course, the rub 

29 As shown in Appendix A, /§ is a consistent estimator of fl if, as n increases indefinitely, the sampling 
distribution of p will ultimately collapse to the true ft. Technically, this is stated as plim n ^ no ^ = ft. As 
noted in Appendix A, consistency is a large-sample property and is often used to study the behavior 
of an estimator when its finite or small-sample properties (e.g., unbiasedness) cannot be determined. 
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EXAMPLE 13.2 

An Example 


TABLE 13.2 
Hypothetical Data 
on Y* (True 
Consumption 
Expenditure), 

X* (True Income), 

Y (Measured 
Consumption 
Expenditure), and X 
(Measured Income); 
All Data in Dollars 


here is that we cannot readily observe or measure a 2 and an( j therefore there is no way 
to judge their relative magnitudes. 

One other suggested remedy is the use of instrumental or proxy variables that, 
although highly correlated with the original X variables, are uncorrelated with the equation 
and measurement error terms (i.e., w, and w,j. If such proxy variables can be found, then one 
can obtain a consistent estimate of fi. But this task is much easier said than done. In prac¬ 
tice it is not easy to find good proxies; we are often in the situation of complaining about 
the bad weather without being able to do much about it. Besides, it is not easy to find out if 
the selected instrumental variable is in fact independent of the error terms u, and w,. 

In the literature there are other suggestions to solve the problem. 30 But most of them are 
specific to the given situation and are based on restrictive assumptions. There is really no 
satisfactory answer to the measurement errors problem. That is why it is so crucial to mea¬ 
sure the data as accurately as possible. 


We conclude this section with an example constructed to highlight the preceding points. 

Table 13.2 gives hypothetical data on true consumption expenditure Y*, true income 
X*, measured consumption Y, and measured income X. The table also explains how these 
variables were measured. 31 

Measurement Errors in the Dependent Variable Y Only. Based on the given data, the 
true consumption function is 

?*= 25.00 + 0.6000X* 

(10.477) (0.0584) 

(13.5.11) 

t= (2.3861) (10.276) 

R 2 = 0.9296 


Y* X* 

75.4666 80.00 

74.9801 100.00 

102.8242 120.00 

125.7651 140.00 

106.5035 160.00 

131.4318 180.00 

149.3693 200.00 

143.8628 220.00 

177.5218 240.00 

182.2748 260.00 


Y X 

67.6011 80.0940 

75.4438 91.5721 

109.6956 112.1406 

129.4159 145.5969 

104.2388 168.5579 

125.8319 171.4793 

153.9926 203.5366 

152.9208 222.8533 

176.3344 232.9879 

174.5252 261.1813 


e w 

-7.8655 0.0940 

0.4636 -8.4279 

6.8714 2.1406 

3.6509 5.5969 

-2.2647 8.5579 

-5.5999 -8.5207 

4.6233 3.5366 

9.0579 2.8533 

-1.1874 -7.0120 

-7.7496 1.1813 


2.4666 
-10.0199 
5.8242 
16.7651 
-14.4965 
-1.5682 
4.3693 
-13.1372 
8.5218 
1.2748 


(1) E(ui) = E(e,) = E(w,) = 0; (2) cov (X, u) = cov ( X , e) = cov (u , s) = cov (w, u) = cov (e, w) = 0; (3) d = 100, <£ = 36, and = 36; 

( Continued ) 

30 See Thomas B. Fomby, R. Carter Hill, and Stanley R. Johnson, Advanced Econometric Methods, 
Springer-Verlag, New York, 1984, pp. 273-277. See also Kennedy, op. cit., pp. 1 38-140, for a discus¬ 
sion of weighted regression as well as instrumental variables. See also: C. S. Maddala, Introduction to 
Econometrics, 3d ed., John Wiley & Sons, New York, 2001, pp. 437-462, and Quirino Paris, "Robust 
Estimators of Errors-in-Variables Models: Part I," Working Paper No. 04-007, 200, Department of 
Agricultural and Resource Economics, University of California at Davis, August 2004. 

31 1 am indebted to Kenneth J. White for constructing this example. See his Computer Handbook Using 
SHAZAM, for use with Damodar Gujarati, Basic Econometrics, September 1985, pp. 117-121. 
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EXAMPLE 13.2 whereas, if we use Y, instead of Y*, we obtain 
( Continued) y. _ 25.00 + 0.6000X* 


(12.218) (0.0681) 
t= (2.0461) (8.8118) 
R 2 = 0.9066 


(13.5.12) 


As these results show, and according to the theory, the estimated coefficients remain 
the same. The only effect of errors of measurement in the dependent variable is that 
the estimated standard errors of the coefficients tend to be larger (see Eq. [13.5.5]), 
which is clearly seen in Eq. (13.5.12). In passing, note that the regression coefficients in 
Eqs. (13.5.11) and (13.5.12) are the same because the sample was generated to match 
the assumptions of the measurement error model. 

Errors of Measurement in X. We know that the true regression is Eq. (13.5.11). Suppose 
now that instead of using X* we use X,. {Note: In reality X*is rarely observable.) The 
regression results are as follows: 

7f = 25.992 + 0.5942 X; 

(11.0810) (0.0617) 

(13.5.13) 

f= (2.3457) (9.6270) 

R 2 = 0.9205 

These results are in accord with the theory—when there are measurement errors in the ex¬ 
planatory variable(s), the estimated coefficients are biased. Fortunately, in this example 
the bias is rather small—from Eq. (13.5.10) it is evident that the bias depends on a 2 /a 
and in generating the data it was assumed that o' 2 = 36 and <x|* = 3667/ thus making the 
bias factor rather small, about 0.98 percent (= 36/3667). 

We leave it to the reader to find out what happens when there are errors of measure¬ 
ment in both Y and X, that is, if we regress Y, on X, rather than 7*on X*(see Exercise 13.23). 


13.6 Incorrect Specification of the Stochastic Error Term 

A common problem facing a researcher is the specification of the error term u, that enters 
the regression model. Since the error term is not directly observable, there is no easy way 
to determine the form in which it enters the model. To see this, let us return to the models 
given in Eqs. (13.2.8) and (13.2.9). For simplicity of exposition, we have assumed that 
there is no intercept in the model. We further assume that u, in Eq. (13.2.8) is such that In 
Uj satisfies the usual OLS assumptions. 

If we assume that Eq. (13.2.8) is the “correct” model but estimate Eq. (13.2.9), what are the 
consequences? It is shown in Appendix 13.A, Section 13A.4, that if In u, ~ N{ 0, a 2 ), then 

Ui ~ lognormal [e ff2/2 , e al (e al - l)] (13.6.1) 

As a result, 

E{ot) = pe al l 2 (13.6.2) 

where e is the base of the natural logarithm. 
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As you can see, a is a biased estimator, as its average value is not equal to the true /3. 
We will have more to say about the specification of the stochastic error term in the chap¬ 
ter on nonlinear-in-the-parameter regression models. 


13.7 Nested versus Non-Nested Models 


In carrying out specification testing, it is useful to distinguish between nested and non¬ 
nested models. To distinguish between the two, consider the following models: 

Model A: Y, = fi\ + p 2 X 2 ; + ^3X3, + P^Xai + ^5X5, + u l 

Model B: Y t = ft + &X 2i + ftX 3l - + u, 

We say that Model B is nested in Model A because it is a special case of Model A: If we 
estimate Model A and test the hypothesis that /j 4 = /j 5 = 0 and do not reject it on the basis 
of, say, the F test, 32 Model A reduces to Model B. If we add variable X4 to Model B, then 
Model A will reduce to Model B if fis is zero; here we will use the t test to test the hypoth¬ 
esis that the coefficient of X5 is zero. 

Without calling them such, the specification error tests that we have discussed previ¬ 
ously and the restricted F test that we discussed in Chapter 8 are essentially tests of nested 
hypothesis. 

Now consider the following models: 

Model C: Y t — a 1 + a 2 X 2 i + a 2 X 2i + u t 

Model D: Y, — + fi> 2 Z 2 . ; + /f, / :, + v ; - 

where the Xs and Z’s are different variables. We say that Models C and D are non-nested 
because one cannot be derived as a special case of the other. In economics, as in other sci¬ 
ences, more than one competing theory may explain a phenomenon. Thus, the monetarists 
would emphasize the role of money in explaining changes in GDP, whereas the Keynesians 
may explain them by changes in government expenditure. 

It may be noted here that one can allow Models C and D to contain regressors that are 
common to both. For example, X3 could be included in Model D and Z 2 could be included 
in Model C. Even then these are non-nested models, because Model C does not contain Z3 
and Model D does not contain X 2 . 

Even if the same variables enter the model, the functional form may make two models 
non-nested. For example, consider the model: 

Model E: Y t = + fi 2 In Z 2i + fa In Z 3l + w t 

Models D and E are non-nested, as one cannot be derived as a special case of the other. 

Since we already have looked at tests of nested models (f and F tests), in the following 
section we discuss some of the tests of non-nested models, which earlier we called model 
mis-specification errors. 


32 More generally, one can use the likelihood ratio test, or the Wald test or the Lagrange Multiplier 
test, which were discussed briefly in Chapter 8. 
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13.8 Tests of Non-Nested Hypotheses 

According to Harvey, 33 there are two approaches to testing non-nested hypotheses: (1) the 
discrimination approach, where given two or more competing models, one chooses a 
model based on some criteria of goodness of fit, and (2) the discerning approach (our ter¬ 
minology) where, in investigating one model, we take into account information provided by 
other models. We consider these approaches briefly. 

The Discrimination Approach 

Consider Models C and D in Section 3.7. Since both models involve the same dependent vari¬ 
able, we can choose between two (or more) models based on some goodness-of-fit criterion, 
such as R 2 or adjusted R 2 , which we have already discussed. But keep in mind that in com¬ 
paring two or more models, the regressand must be the same. Besides these criteria, there are 
other criteria that are also used. These include Akaike’s information criterion (AIC), 
Schwarz’s information criterion (SIC), and Mallows’s C p criterion. We discuss these cri¬ 
teria in Section 13.9. Most modem statistical software packages have one or more of these 
criteria built into their regression routines. In the last section of this chapter, we will illustrate 
these criteria using an extended example. On the basis of one or more of these criteria a model 
is finally selected that has the highest R 2 or the lowest value of AIC or SIC, etc. 

The Discerning Approach 

The Non-Nested F Test or Encompassing F Test 

Consider Models C and D introduced in Section 3.7. How do we choose between the two 
models? For this purpose suppose we estimate the following nested or hybrid model: 

Model F: }\ = A.] A. 2 Xi. + A.3 Xt , + 7.4Z2; + A, 5 AC, + it. 

Notice that Model F nests or encompasses Models C and D. But note that C is not nested in 
D and D is not nested in C, so they are non-nested models. 

Now if Model C is correct, 7.4 = 15= 0, whereas Model D is correct if k 2 = A.3 = 0. 
This testing can be done by the usual F test, hence the name non-nested F test. 

However, there are problems with this testing procedure. First, if the X’s and the Z’s are 
highly correlated, then, as noted in the chapter on multicollinearity, it is quite likely that one 
or more of the l’s are individually statistically insignificant, although on the basis of the F 
test one can reject the hypothesis that all the slope coefficients are simultaneously zero. In 
this case, we have no way of deciding whether Model C or Model D is the correct model. 
Second, there is another problem. Suppose we choose Model C as the reference hypothesis 
or model, and find that all its coefficients are significant. Now we add Z 2 or Z3 or both to the 
model and find, using the F test, that their incremental contribution to the explained sum of 
squares (ESS) is statistically insignificant. Therefore, we decide to choose Model C. 

But suppose we had instead chosen Model D as the reference model and found that all 
its coefficients were statistically significant. But when we add W 2 or Xj or both to this 
model, we find, again using the F test, that their incremental contribution to ESS is 
insignificant. Therefore, we would have chosen model D as the correct model. Hence, “the 
choice of the reference hypothesis could determine the outcome of the choice model,” 34 
especially if severe multicollinearity is present in the competing regressors. Finally, the 
artificially nested model F may not have any economic meaning. 


33 Andrew Harvey, The Econometric Analysis of Time Series, 2d ed., The MIT Press, Cambridge, Mass., 
1990, Chapter 5. 

34 Thomas B. Fomby, R. Carter Hill, and Stanley R. Johnson, Advanced Econometric Methods, Springer 
Verlag, New York, 1984, p. 416. 
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EXAMPLE 13.3 

An Illustrative 
Example: The 
St. Louis Model 


To determine whether changes in nominal GNP can be explained by changes in the 
money supply (monetarism) or by changes in government expenditure (Keynesianism), 
we consider the following models: 

Yt = a + PoM t + Pi M t _i + f$2 M t—2 + /S 3 M t _ 3 + /?4M t _4 + Uit 


= ct+J2PiM t -i + u u 

i =o 

Yt = Y + ^-0 E t + Ai E t-t + >2 E t-2 + >- 3 £ t—3 + t4 f t—4 + U2t 


(13.8.1) 


= Y + y^i E t-i + U2t 
i =0 


(13.8.2) 


where Y t = rate of growth in nominal GNP at time t 

M t = rate of growth in the money supply (Mi version) at time f 
f t == rate of growth in full, or high, employment government expenditure 
at time t 


In passing, note that Eqs. (1 3.8.1) and (13.8.2) are examples of distributed-lag models, 
a topic thoroughly discussed in Chapter 17. For the time being, simply note that the effect 
of a unit change in the money supply or government expenditure on GNP is distributed 
over a period of time and is not instantaneous. 

Since a priori it may be difficult to decide between the two competing models, let us 
enmesh the two models as shown below: 


Y t = constant + ^ Pi M t _; + ^ A,- £ t -i + u 31 (13.8.3) 

/=o /=o 

This nested model is one form in which the famous (Federal Reserve Bank of) St. Louis 
model, a pro-monetary-school bank, has been expressed and estimated. The results of this 
model for the period 1953-1 to 1976-IV for the United States are as follows (t ratios in 
parentheses): 35 


Coefficient Estimate Coefficient Estimate 


Po 

0.40 

(2.96) 

T-o 

0.08 

(2.26) 

Pt 

0.41 

(5.26) 

Ai 

0.06 

(2.52) 

Pz 

0.25 

(2.14) 

7-2 

0.00 

(0.02) 

P'S 

0.06 

(0.71) 

a 3 

-0.06 

(-2.20) 

Pa 

-0.05 

(-0-37) 

a 4 

-0.07 

(-1.83) 

X> 

1.06 

(5.59) 

tn 

0.03 

(0.40) 


R 2 = 0.40 
d = 1.78 


What do these results suggest about the superiority of one model over the other? If we 
consider the cumulative effect of a unit change in M and E on Y, we obtain, respectively, 
^ =0 pi = 1.06 and o — 0-03, the former being statistically significant and the lat¬ 
ter not. This comparison would tend to support the monetarist claim that it is changes in 
the money supply that determine changes in the (nominal) GNP. It is left as an exercise for 
the reader to critically evaluate this claim. 


35 See Keith M. Carlson, "Does the St. Louis Equation Now Believe in Fiscal Policy?" Review, Federal 
Reserve Bank of St. Louis, vol. 60, no. 2, February 1978, p. 17, table IV. 
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Davidson-MacKinnon JTest 36 

Because of the problems just listed in the non-nestedf 7 testing procedure, alternatives have 
been suggested. One is the Davidson-MacKinnon /test. To illustrate this test, suppose we 
want to compare hypothesis or Model C with hypothesis or Model D. The J test proceeds 
as follows: 

1. We estimate Model D and from it we obtain the estimated Y values, YP. 

2. We add the predicted Y value in Step 1 as an additional regressor to Model C and 
estimate the following model: 

Yi = «! + a 2 X 2i + a 3 X 3i + « 4 Yf + u t (13.8.5) 

where the YP values are obtained from Step 1. This model is an example of the 
encompassing principle, as in the Hendry methodology. 

3. Using the t test, test the hypothesis that a 4 = 0. 

4. If the hypothesis that a 4 = 0 is not rejected, we can accept (i.e., not reject) Model C 
as the true model because YP included in Eq. (13.8.5), which represents the influence of 
variables not included in Model C, has no additional explanatory power beyond that con¬ 
tributed by Model C. In other words, Model C encompasses Model D in the sense that the 
latter model does not contain any additional information that will improve the performance 
of Model C. By the same token, if the null hypothesis is rejected. Model C cannot be the 
true model (why?). 

5. Now we reverse the roles of hypotheses, or Models C and D. We now estimate Model 
C first, use the estimated Y values from this model as the regressor in Eq. (13.8.5), repeat 
Step 4, and decide whether to accept Model D over Model C. More specifically, we esti¬ 
mate the following model: 

Y t = Pi + ftZ» + p 3 Z 3i + p*Yf + Ui (13.8.6) 

where Yf are the estimated Y values from Model C. We now test the hypothesis that 
Pn = 0. If this hypothesis is not rejected, we choose Model D over C. If the hypothesis that 
/f 4 = 0 is rejected, we choose C over D, as the latter does not improve over the performance 
of C. 

Although it is intuitively appealing, the / test has some problems. Since the tests given 
in Eqs. (13.8.5) and (13.8.6) are performed independently, we have the following likely 
outcomes: 


Hypothesis: a 4 = 0 

Hypothesis: /J 4 = 0 Do Not Reject Reject 

Do not reject Accept both C and D Accept D, reject C 

Reject Accept C, reject D Reject both C and D 


As this table shows, we will not be able to get a clear answer if the / testing procedure leads 
to the acceptance or rejection of both models. In case both models are rejected, neither 
model helps us to explain the behavior of Y. Similarly, if both models are accepted, as 
Kmenta notes, “the data are apparently not rich enough to discriminate between the two 
hypotheses [models].” 37 

36 R. Davidson and J. C. MacKinnon, "Several Tests for Model Specification in the Presence of Alterna¬ 
tive Hypotheses," Econometrica, vol. 49, 1981, pp. 781-793. 

37 jan Kmenta, op. cit., p. 597. 





EXAMPLE 13.4 

Personal 
Consumption 
Expenditure and 
Disposable 
Personal Income 


TABLE 13.3 
Per Capita Personal 
Consumption 
Expenditure 
(PPCE) and per 
Capita Personal 
Disposable Income 
(PDPI), U.S., 
1970-2005 

Source: Economic Report of 
the President, 2007. 
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Another problem with the J test is that when we use the t statistic to test the significance 
of the estimated Y variable in models (13.8.5) and (13.8.6), the t statistic has the standard 
normal distribution only asymptotically, that is, in large samples. Therefore, the J test may 
not be very powerful (in the statistical sense) in small samples because it tends to reject the 
true hypothesis or model more frequently than it ought to. 


To illustrate the / test, consider the data given in Table 13.3. This table gives data on per 
capita personal consumption expenditure (PPCE) and per capita disposable personal 
income (PDPI), both measured in current (2008) dollars for the United States for the 
period 1970-2005. Consider the following rival models: 

Model A: PPCE t = ai + a 2 PDPIt + a 3 PDPI t -i + u t (13.8.7) 

Model B: PPCE f = ft + ft PDPI f + ft PCPE t _! + u t (13.8.8) 

Model A states that PPCE depends on PDPI in the current and previous time period; 
this model is an example of what is known as the distributed-lag model (see Chapter 1 7). 
Model B postulates that PPCE depends on current PDPI as well as PPCE in the previous time 
period; this model represents what is known as the autoregressive model (see Chapter 17 
again). The reason for introducing the lagged value of PPCE in this model is to reflect iner¬ 
tia or habit persistence. 

The results of estimating these models separately were as follows: 

Model A: PPCEt = -606.6347 + 0.61 70 PDPI t + 0.3530 PDPI, : _i 

t= (-3.8334) (2.5706) (1.4377) (13.8.9) 

/? 2 = 0.9983 d= 0.2161 

Model B: PPCEt = 76.8947 + 0.2074 PDPIt + 0.8104 PPCE t _i 

t= (0.7256) (2.6734) (9.7343) (13.8.10) 

R 2 = 0.9996 d= 0.9732 


Year 

PPCE 

PDPI 

Year 

PPCE 

PDPI 

1970 

3,162 

3,587 

1988 

1 3,685 

15,297 

1971 

3,379 

3,860 

1989 

14,546 

16,257 

1972 

3,671 

4,140 

1990 

15,349 

17,131 

1973 

4,022 

4,616 

1991 

15,722 

1 7,609 

1974 

4,364 

5,010 

1992 

16,485 

18,494 

1975 

4,789 

5,498 

1993 

1 7,204 

18,872 

1976 

5,282 

5,972 

1994 

18,004 

19,555 

1977 

5,804 

6,517 

1995 

18,665 

20,287 

1978 

6,417 

7,224 

1996 

19,490 

21,091 

1979 

7,073 

7,967 

1997 

20,323 

21,940 

1980 

7,716 

8,822 

1998 

21,291 

23,161 

1981 

8,439 

9,765 

1999 

22,491 

23,968 

1982 

8,945 

10,426 

2000 

23,862 

25,472 

1983 

9,775 

11,131 

2001 

24,722 

26,235 

1984 

10,589 

12,319 

2002 

25,501 

27,164 

1985 

11,406 

13,037 

2003 

26,463 

28,039 

1986 

12,048 

13,649 

2004 

27,937 

29,536 

1987 

12,766 

14,241 

2005 

29,468 

30,458 


( Continued ) 
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EXAMPLE 13.4 

(' Continued) 

If one were to choose between these two models on the basis of the discrimination 
approach, using the highest R 2 criterion, one would probably choose Model B (13.8.10) 
because it is just slightly higher than Model A (13.8.9). Also, in Model B (13.8.10), both 
variables are individually statistically significant, whereas in Model A (13.8.9) only the 
current PDPI is statistically significant (there might be a collinearity problem, though). For 
predictive purposes, there is not much difference between the two estimated R 2 values, 
though. 

To apply the / test, suppose we assume Model A is the null hypothesis, or the main¬ 
tained model, and Model B is the alternative hypothesis. Following the / test steps 
discussed earlier, we use the estimated PPCE values from model (13.8.10) as an additional 
regressor in Model A. The following is the outcome from this regression: 

PPCEt = —35.17 + 0.2762 PDPI ( - 0.5141 PDPI, t + 1.2351 PPClf 

t = (-0.43) (2.60) (-4.05) (12.06) (13.8.11) 

ft 2 =1.00 d= 1.5205 

where PPCE t on the right-hand side of Eq. (13.8.11) represents the estimated PPCE values 
from the original Model B (13.8.10). Since the coefficient of this variable is statistically 
significant with a very high t-statistic of 12.06, following the / test procedure we have to 
reject Model A in favor of Model B. 

Now we will assume Model B is the maintained hypothesis and Model A is the alterna¬ 
tive. Following the exact same procedure, we obtain the following results: 

PPCEt = -823.7 + 1.4309 PDPI f + 1.0009 PPCEt-1 - 1.4563 PPCE? 

f= (-3.45) (4.64) (12.06) (-4.05) (13.8.12) 

R 2 = 1.00 d = 1.5205 

where PPCE(* on the right-hand side of Eq. (13.8.12) represents the estimated PPCE values 
from the original Model A (13.8.9). In this regression, the coefficient of PPCEf is also sta¬ 
tistically significant with a t-statistic of —4.05. This result suggests that we should now 
reject Model B in favor of Model A. 

All this tells us is that neither model is particularly useful in explaining the behavior of 
per capita personal consumption expenditure in the United States over the period 
1970-2005. Of course, we have considered only two competing models. In reality, there 
may be more than two models. The / test procedure can be extended to multiple model 
comparisons, although the analysis can quickly become complex. 

This example shows very vividly why the CLRM assumes that the regression model 
used in the analysis is correctly specified. Obviously, in developing a model it is crucial to 
pay very careful attention to the phenomenon being modeled. 


Other Tests of Model Selection 

The /test just discussed is only one of a group of tests of model selection. There is the Cox 
test, the JA test, the P test, the Mizon-Richard encompassing test, and variants of these 
tests. Obviously, we cannot hope to discuss these specialized tests, for which the reader 
may want to consult the references cited in the various footnotes. 38 


38 See also Badi H. Baltagi, Econometrics, Springer, New York, 1998, pp. 209-222. 
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13.9 Model Selection Criteria 


In this section we discuss several criteria that have been used to choose among competing 
models and/or to compare models for forecasting purposes. Here we distinguish between 
in-sample forecasting and out-of-sample forecasting. In-sample forecasting essentially 
tells us how the chosen model fits the data in a given sample. Out-of-sample forecasting is 
concerned with determining how a fitted model forecasts future values of the regressand, 
given the values of the regressors. 

Several criteria are used for this purpose. In particular, we discuss these criteria: (1) R 2 , 
(2) adjusted R 2 ( = R 2 ), (3) Akaike’s information criterion (AIC), (4) Schwarz’s information 
criterion (SIC), (5) Mallows’s C p criterion, and (6) forecast y 2 (chi-square). All these crite¬ 
ria aim at minimizing the residual sum of squares (RSS) (or increasing the R 2 value). How¬ 
ever, except for the first criterion, criteria (2), (3), (4), and (5) impose a penalty for including 
an increasingly large number of regressors. Thus there is a trade-off between goodness of fit 
of the model and its complexity (as judged by the number of regressors). 


The R 2 Criterion 

We know that one of the measures of goodness of fit of a regression model is R 2 , which, as 
we know, is defined as: 


ESS _ j _ RSS 
TSS ~~ ~~ TSS 


(13.9.1) 


R 2 , thus defined, of necessity lies between 0 and 1. The closer it is to 1, the better is the fit. 
But there are problems with R 2 . First, it measures in-sample goodness of fit in the sense of 
how close an estimated Y value is to its actual value in the given sample. There is no guar¬ 
antee that it will forecast well out-of-sample observations. Second, in comparing two or 
more R 2 ’ s, the dependent variable, or regressand, must be the same. Third, and more 
importantly, an R 2 cannot fall when more variables are added to the model. Therefore, there 
is every temptation to play the game of “maximizing the R 2 ” by simply adding more vari¬ 
ables to the model. Of course, adding more variables to the model may increase R 2 but it 
may also increase the variance of forecast error. 


Adjusted R 2 

As a penalty for adding regressors to increase the R 2 value, Henry Theil developed the 
adjusted R 2 , denoted by R 2 , which we studied in Chapter 7. Recall that 


RSS/(n - k) 
TSS/(n - 1) 


(13.9.2) 


As you can see from this formula, R 2 < R 2 , showing how the adjusted R 2 penalizes for 
adding more regressors. As we noted in Chapter 8, unlike R 2 , the adjusted R 2 will increase 
only if the absolute t value of the added variable is greater than 1. For comparative pur¬ 
poses, therefore, R 2 is a better measure than R 2 . But again keep in mind that the regressand 
must be the same for the comparison to be valid. 
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Akaike's Information Criterion (AIC) 

The idea of imposing a penalty for adding regressors to the model has been carried further 
in the AIC criterion, which is defined as: 

AIC = e 2 */" e 2 */" — (13.9.3) 

n n 


where k is the number of regressors (including the intercept) and n is the number of obser¬ 
vations. For mathematical convenience, Eq. (13.9.3) is written as 


In AIC = 


(2k\ , /RSS\ 

U) +b br) 


(13.9.4) 


where In AIC = natural log of AIC and 2 k/n — penalty factor. Some textbooks and soft¬ 
ware packages define AIC only in terms of its log transform so there is no need to put In 
before AIC. As you see from this formula, AIC imposes a harsher penalty than R 2 for 
adding more regressors. In comparing two or more models, the model with the lowest value 
of AIC is preferred. One advantage of AIC is that it is useful for not only in-sample but also 
out-of-sample forecasting performance of a regression model. Also, it is useful for both 
nested and non-nested models. It also has been used to determine the lag length in an 
AR(p) model. 

Schwarz's Information Criterion (SIC) 

Similar in spirit to the AIC, the SIC criterion is defined as: 



n 



n 


(13.9.5) 


or in log-form: 


In SIC = -lnw + ln 


(13.9.6) 


where [(k/n) In n\ is the penalty factor. SIC imposes a harsher penalty than AIC, as is ob¬ 
vious from comparing Eq. (13.9.6) to Eq. (13.9.4). Like AIC, the lower the value of SIC, 
the better the model. Again, like AIC, SIC can be used to compare in-sample or out-of- 
sample forecasting performance of a model. 


Mallows's C p Criterion 

Suppose we have a model consisting of k regressors, including the intercept. Let a 2 as 
usual be the estimator of the true a 2 . But suppose that we only choose p regressors (p < k) 
and obtain the RSS from the regression using these p regressors. Let RSS P denote the 
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residual sum of squares using the p regressors. Now C. P. Mallows has developed the 
following criterion for model selection, known as the C p criterion: 


C p 


RSS P 


-(« - 2 p) 


(13.9.7) 


where n is the number of observations. 

We know that E(a 2 ) is an unbiased estimator of the true a 2 . Now, if the model with p 
regressors is adequate in that it does not suffer from lack of fit, it can be shown 39 that 
^(RSS^) = (n — p)o 2 . In consequence, it is true approximately that 

E{C P ) (W ~f g2 -(n-2p)**p (13.9.8) 

CT Z 

In choosing a model according to the C p criterion, we would look for a model that has a low 
C p value, about equal to p. In other words, following the principle of parsimony, we will 
choose a model with p regressors (p < k) that gives a fairly good fit to the data. 

In practice, one usually plots C p computed from Eq. (13.9.7) against p. An “adequate” 
model will show up as a point close to the C p — p line, as can be seen from Figure 13.3. As 
this figure shows, Model A may be preferable to Model B, as it is closer to the C p = p line 
than Model B. 


FIGURE 13.3 

Mallows’s C p plot. 


A Word of Caution about Model Selection Criteria 

We have discussed several model selection criteria. But one should look at these criteria as 
an adjunct to the various specification tests we have discussed in this chapter. Some of the 
criteria discussed above are purely descriptive and may not have strong theoretical prop¬ 
erties. Some of them may even he open to the charge of data mining. Nonetheless, they are 
so frequently used by the practitioner that the reader should be aware of them. No one of 
these criteria is necessarily superior to the others. 40 Most modern software packages now 



39 Norman D. Draper and Harry Smith, Applied Regression Analysis, 3d ed., John Wiley Sr Sons, New 
York, 1998, p. 332. See this book for some worked examples of C p . 

40 For a useful discussion on this topic, see Francis X. Diebold, Elements of Forecasting, 2d ed., South 
Western Publishing, 2001, pp. 83-89. On balance, Diebold recommends the SIC criterion. 
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include R 2 , adjusted R 2 , AIC, and SIC. Mallows’s C p is not routinely given, although it can 
be easily computed from its definition. 

Forecast Chi-Square (x 2 ) 

Suppose we have a regression model based on n observations and suppose we want to use 
it to forecast the (mean) values of the regressand for an additional t observations. As noted 
elsewhere, it is a good idea to save part of the sample data to see how the estimated model 
forecasts the observations not included in the sample, the postsample period. 

Now the forecast y 2 test is defined as follows: 

Forecast, y 2 = (13.9.9) 

where w, is the forecast error made for period i ( = n + 1, n + 2,..., + n +1), using the 
parameters obtained from the fitted regression and the values of the regressors in the post¬ 
sample period, a 2 is the usual OLS estimator of a 2 based on the fitted regression. 

If we hypothesize that the parameter values have not changed between the sample and 
postsample periods, it can be shown that the statistic given in Eq. (13.9.9) follows the 
chi-square distribution with t degrees of freedom, where t is the number of periods for 
which the forecast is made. As Charemza and Deadman note, the forecast y 2 test has 
weak statistical power, meaning that the probability that the test will correctly reject a 
false null hypothesis is low and therefore the test should be used as a signal rather than a 
definitive test. 41 

13.10 Additional Topics in Econometric Modeling 

As noted in the introduction to this chapter, the topic of econometric modeling and diag¬ 
nostic testing is so vast and evolving that specialized books are written on this topic. In the 
previous section we have touched on some major themes in this area. In this section we 
consider a few additional features that researchers may find useful in practice. In particu¬ 
lar, we consider these topics: (1) outliers, leverage, and influence; (2) recursive least 
squares; and (3) Chow’s prediction failure test. Of necessity the discussion of each of 
these topics will be brief. 

Outliers, Leverage, and Influence 42 

Recall that, in minimizing the residual sum of squares (RSS), OLS gives equal weight to 
every observation in the sample. But every observation may not have equal impact on the 
regression results because of the presence of three types of special data points called 
outliers, leverage, and influence points. It is important that we know what they are and 
how they influence regression analysis. 

In the regression context, an outlier may be defined as an observation with a “large residual.” 
Recall that w, = (Y t — Yi), that is, the residual represents the difference (positive or negative) 
between the actual value of the regressand and its value estimated from the regression model. 


41 Wojciech W. Charemza and Derek F. Deadman, New Directions in Econometric Practice: A General to 
Specific Modelling, Cointegration and Vector Autoregression, 2d ed., Edward Elgar Publishers, 1997, 
p. 30. See also pp. 250-252 for their views on various model selection criteria. 

42 The following discussion is influenced by Chandan Mukherjee, Howard White, and Marc Wyuts, 
Econometrics and Data Analysis for Developing Countries, Routledge, New York, 1998, pp. 1 37-148. 
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In each subfigure, the solid line gives the OLS line for all the data and the broken line gives the 
OLS line with the outlier, denoted by an IS, omitted. In (a), the outlier is near the mean value 
of X and has low leverage and little influence on the regression coefficients. In ( b ), the outlier 
is far away from the mean value of X and has high leverage as well as substantial influence on 
the regression coefficients. In (c), the outlier has high leverage but low influence on the 
regression coefficients because it is in line with the rest of the observations. 



When we say that a residual is large, it is in comparison with the other residuals and very often 
such a large residual catches our attention immediately because of its rather large vertical dis¬ 
tance from the estimated regression line. Note that in a data set there may be more than one 
outlier. We have already encountered an example of this in Exercise 11.22, where you were 
asked to regress percent change in stock prices (7) on percent change in consumer prices ( X) 
for a sample of 20 countries. One observation, that relating to Chile, was an outlier. 

A data point is said to exert (high) leverage if it is disproportionately distant from the 
bulk of the values of a regressor(s). Why does a leverage point matter? It matters be¬ 
cause it is capable of pulling the regression line toward itself, thus distorting the slope of 
the regression line. If this actually happens, then we call such a leverage (data) point an 
influential point. The removal of such a data point from the sample can dramatically 
affect the regression line. Returning to Exercise 11.22, you will see that if you regress Y 
on X including the observation for Chile, the slope coefficient is positive and “highly sta¬ 
tistically significant.” But if you drop the observation for Chile, the slope coefficient is 
practically zero. Thus the Chilean observation has leverage and is also an influential 
observation. 

To further clarify the nature of outliers, leverage, and influence points, consider the dia¬ 
gram in Figure 13.4, which is self-explanatory. 43 

How do we handle such data points? Should we just drop them and confine our attention 
to the remaining data points? According to Draper and Smith: 

Automatic rejection of outliers is not always a wise procedure. Sometimes the outlier is pro¬ 
viding information that other data points cannot due to the fact that it arises from an unusual 
combination of circumstances which may be of vital interest and requires further investigation 
rather than rejection. As a general rule, outliers should be rejected out of hand only if they can 
be traced to causes such as errors of recording the observations or setting up the apparatus [in 
a physical experiment]. Otherwise, careful investigation is in order. 44 


43 Adapted from John Fox, Applied Regression Analysis, Linear Models, and Related Methods, Sage 
Publications, California, 1997, p. 268. 

^Norman R. Draper and Harry Smith, op. cit., p. 76. 
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What are some of the tests that one can use to detect outliers and leverage points? There are 
several tests discussed in the literature, but we will not discuss them here because that will 
take us far afield. 45 Software packages such as SHAZAM and MICROFIT have routines to 
detect outliers, leverage, and influential points. 

Recursive Least Squares 

In Chapter 8 we examined the question of the structural stability of a regression model 
involving time series data and showed how the Chow test can be used for this purpose. 
Specifically, you may recall that in that chapter we discussed a simple savings function (sav¬ 
ings as a function of income) for the United States for the period 1970-2005. There we saw 
that the savings income relationship probably changed around 1982. Knowing the point of 
the structural break we were able to confirm it with the Chow test. 

But what happens if we do not know the point of the structural break (or breaks)? This 
is where one can use recursive least squares (RELS). The basic idea behind RELS is very 
simple and can be explained with the savings-income regression. 

Y t = fa + fhX t + u t 

where Y — savings and X = income and where the sample is for the period 1970-2005. 
(See the data in Table 8.11.) 

Suppose we first use the data for 1970-1974 and estimate the savings function, obtain¬ 
ing the estimates of fa and fa. Then we use the data for 1970-1975 and again estimate the 
savings function and obtain the estimates of the two parameters. Then we use the data for 
1970-1976 and re-estimate the savings model. In this fashion we go on adding an addi¬ 
tional data point on Y and X until we exhaust the entire sample. As you can imagine, each 
regression run will give you a new set of estimates of fa and fa. If you plot the estimated 
values of these parameters against each iteration, you will see how the values of estimated 
parameters change. If the model under consideration is structurally stable, the changes in 
the estimated values of the two parameters will be small and essentially random. However, 
if the estimated values of the parameters change significantly, it would indicate a structural 
break. RELS is thus a useful routine with time series data since time is ordered chronolog¬ 
ically. It is also a useful diagnostic tool in cross-sectional data where the data are ordered 
by some “size” or “scale” variable, such as the employment or asset size of the firm. In 
Exercise 13.30 you are asked to apply RELS to the savings data given in Table 8.11. 

Software packages such as SHAZAM, EViews, and MICROFIT now do recursive least- 
squares estimates routinely. RELS also generates recursive residuals on which several 
diagnostic tests have been based. 46 

Chow's Prediction Failure Test 

We have already discussed Chow’s test of structural stability in Chapter 8. Chow has shown 
that his test can be modified to test the predictive power of a regression model. Again, we 
will revert to the U.S. savings-income regression for the period 1970-1995. 


45 Here are some accessible sources: Alvin C. Rencher, Linear Models in Statistics, John Wiley & Sons, 
New York, 2000, pp. 219-224; A. C. Atkinson, Plots, Transformations and Regression: An Introduction 
to Graphical Methods of Diagnostic Regression Analysis, Oxford University Press, New York, 1985, 
Chapter 3; Ashis Sen and Muni Srivastava, Regression Analysis: Theory, Methods, and Applications, 
Springer-Verlag, New York, 1990, Chapter 8; and John Fox, op. cit., Chapter 11. 

46 For details, see Jack Johnston and John DiNardo, Econometric Methods, 4th ed., McGraw-Hill, New 
York, 1997, pp. 117-121. 
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Suppose we estimate the savings-income regression for the period 1970-1981, obtain¬ 
ing /h,70-81 and ^2,70-81, which are the estimated intercept and slope coefficients based on 
the data for 1970-1981. Now using the actual values of income for the period 1982-1995 
and the intercept and slope values for the period 1970-1981, we predict the values of 
savings for each of 1982-1995 years. The logic here is that if there is no serious structural 
change in the parameter values, the values of savings estimated for 1982-1995, based on 
the parameter estimates for the earlier period, should not be very different from the actual 
values of savings prevailing in the latter period. Of course, if there is a vast difference 
between the actual and predicted values of savings for the latter period, it will cast doubts 
on the stability of the savings-income relation for the entire data period. 

Whether the difference between the actual and estimated savings value is large or small 
can be tested by the F test as follows: 


(£«?)/<».-*) 


(13.10.1) 


where ri\ = number of observations in the first period (1970-1981) on which the initial 
regression is based, «2 = number of observations in the second or forecast period, J2 u* 2 m 
RSS when the equation is estimated for all the observations («i + nfi), and ff, u 2 = RSS 
when the equation is estimated for the first n\ observations, and k is the number of para¬ 
meters estimated (two in the present instance). If the errors are independent, and identi¬ 
cally, normally distributed, the F statistic given in Eq. (13.10.1) follows the F distribution 
with « 2 and «i df, respectively. In Exercise 13.31 you are asked to apply Chow’s predictive 
failure test to find out if the savings-income relation has in fact changed. In passing, note 
the similarity between this test and the forecast y 2 test discussed previously. 


Missing Data 

In applied work it is not uncommon to find that sometimes observations are missing from 
the sample data. For example, in time series data there may be gaps in the data because of 
special circumstances. During the Second World War, data on some macro variables were 
not available or were not published for strategic reasons. In cross-section data it is not un¬ 
common to find that information on some variables for some individuals is missing, espe¬ 
cially in data collected from questionnaire-type surveys. In panel data also, over time some 
respondents drop out or do not provide information on all the questions. 

Whatever the reason, missing data is a problem that every researcher faces from time 
to time. The question is how we deal with the missing data. Is there any way to impute 
values to the missing observations? 

This is not an easy question to answer. Although there are some complicated solutions 
suggested in the literature, we will not pursue them here because of their complexity. 47 How¬ 
ever, we will discuss two cases. 48 In the first case, the reasons for the missing data are inde¬ 
pendent of the available observations, which are called by Darnell the “ignorable case.” In 
the second case, not only are the available data incomplete, but the missing observations may 
be systematically related to the available data. This is a more serious case, for it may be the 
result of self-selection bias, that is, the observed data are not truly randomly collected. 


47 For a thorough, but rather advanced, treatment of the subject, see A. Colin Cameron and Pravin K. 
Trivedi, Microeconometrics: Methods and Applications, Cambridge University Press, New York, 2005, 
Chapter 27, pp. 923-941. 

48 The following discussion is based on Adrian C. Darnell, A Dictionary of Econometrics, Edward Elgar 
Publishing, Lyne, U.K., 1994, pp. 256-258. 
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In the ignorable case, we may simply ignore the missing observations and use the avail¬ 
able observations. Most statistical packages do this automatically. Of course, in this case 
the sample size is reduced and we may not be able to get precise estimates of the regression 
coefficients. We might use the available data to shed some light on the missing observa¬ 
tions, however. Here we consider three possibilities. 

1. Out of a total number of observations of N, we have complete data on N\ (N\ < N) for 
both the regressand and k regressors denoted by Y, and X 2t respectively. (Y 2 is vector of 
N\ observations and JY) is a row vector of A: regressors). 

2. For some observations (N 2 < N) there are complete data on the regressand, denoted by 
Y 2 , but incomplete observations on some X 2 (again these are vectors). 

3. For some observations (IV3 < N), there are no data on Y, but complete data on X, denoted 
by *3. 

In the first case, regression of Y t on X\ will produce estimates of the regression coefficients 
that are unbiased but they may not be efficient because we ignore N 2 and JV 3 observations. 
The other two cases are rather complicated and we leave it for the reader to follow the ref¬ 
erences for solutions. 49 

13.11 Concluding Examples 

We conclude this chapter with two examples that illustrate one or more points raised in the 
chapter. The first example on wage determination uses cross-section data and the second 
example, which considers the real consumption function for the U.S., uses time series data. 

1. A Model of Hourly Wage Determination 

To examine what factors determine hourly wages, we consider a Mincer-type wage model, 
which has become popular with labor economists. This model has the following form: 50 

In wage,- — fi\ + /f 2 Edu, + ftExp,• + ^Fe, + ^NW, + + /S7WIC, + w, 

(13.11.1) 

Where In wage = natural log of hourly wage ($), Edu = education in years, Exp = labor 
market experience, Fe = 1 if female, 0 otherwise, NW = 1 if non-white, 0 otherwise, UN = 1 
if in union, 0 otherwise, and WK = 1 for non-hourly paid workers, 0 otherwise. For the 
non-hourly paid workers, the hourly wage is computed as weekly earnings divided by the 
usual hours worked. 

There are many more variables that could be added to this model. Some of these vari¬ 
ables are ethnic origin, marital status, number of children under age 6, and wealth or non- 
lahor income. For now, we will work with the model shown in Eq. (13.11.1). 

The data consist of 1,289 persons interviewed in March 1985 as a part of the Current 
Population Survey (CPS) periodically conducted by the U.S. Census Bureau. These data 
were originally collected by Paul Rudd. 51 

49 Besides the references already cited, see A. A. Afifi, and R. M. Elashoff, "Missing Observations in 
Multivariate Statistics," journal of the American Statistical Association, vol. 61, 1966, pp. 595-604, and 
vol. 62, 1967, pp. 10-29. 

50 SeeJ. Mincer, School, Experience and Earnings, Columbia University Press, New York, 1974. 

51 Paul A. Rudd, An Introduction to Classical Econometric Theory, Oxford University Press, New York, 
2000. We have not included data on age because it is highly collinear with job experience. 
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TABLE 13.4 

E Views Regression 
Results Based on 
Equation (13.11.1) 


A priori, we would expect education and experience to have a positive impact on wages. 
The dummy variables Fe and NW are expected to have a negative impact on wages if there 
is some kind of discrimination and UN is expected to have a positive impact because of 
uncertainty of income. 

When all the dummy variables take a value of zero, Eq. (13.11.1) reduces to 

In wage, — + ^Edu, + ftExp,- + «,• (13.11.2) 

which is the wage function for a non-unionized white male worker who is on an hourly 
wage rate. This is the base, or reference, category. 

Let us now present the regression results and then discuss them. 


Dependent 

Variable: 

LW 



Method: Least Squares 



Sample: 1 

-1,289 




fpgluded 

observation 

is: 1,289 



Coefficient 

Std. 

Error t Statistic 
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C 

.-if,. 037880 
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-0.234934 

0.026071 -9.011170 

o.oooi 

NW 

-0.124447 

0.0363401 -3.424498 

0.0006 

ON 

0.207508 

0.036265 5.721963 

o.oooi 

WK 

C.228725 

0.028939 7.903647 

o.oooi ; 

R -squared 

0.376053 

Mean dependent War. 

f.342416 

Adjusted 

R- squared 

0.373133 

S.D. dependent va r. 

0.586356 

S.E. of regression. 

0.464247 

Akaike into critori or 

l 1.308614 

Sum squared resid. 

276.303S 

Schwarz @fter±on 

1.336645 

Log likel 
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-836.4018 

Hannan-Quinn critwt 

1.319136 

F-statist 

ic 

128.7711 

Durbin-Watson stat. 

1., 117004 

Prob. ( F- 

statistic) 

0.000000 




The first thing to notice is that all the estimated coefficients are individually highly signifi¬ 
cant, for the ^-values are so low. The F is also very high, suggesting that collectively, also, 
all the variables are statistically important. 

Compared to the reference worker, the average wage of a female worker and a non-white 
worker is lower. Union workers and those who are paid weekly, on average, make more 
wages. 

How adequate is model (13.11.1), given the variables we have considered? Is it possi¬ 
ble that non-white female workers earn less than white workers? Is it possible that non¬ 
white female non-union workers earn less than white female non-union workers? In other 
words, are there any interaction effects between the quantitative regressors and the dummy 
variables? 

Statistical packages have routines to answer such questions. For instance, EViews has 
such a facility. After a model is estimated, if you think that some variables can be added 
to the model but you are not sure of their importance, you can run the test of omitted 
variables. 

To show this, suppose we estimate Eq. (13.11.1) and now want to find out if the prod¬ 
ucts of Fe and NW, FE and UN, and FE and WK should be added to the model to take into 
account the interaction between the explanatory variables. Using the EViews 6 routine, we 
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obtain the following answer: The null hypothesis is that these three added variables have no 
effect on the estimated model. 

As you would suspect, we can use the F test (discussed in Chapter 8) to assess the in¬ 
cremental, or marginal, contribution of the added variables and test the null hypothesis. For 
our example, the results are as follows: 


TABLE 13.5 

Partial E Views 
Results Using 
Interactions 


Omitted Variables: FE*NW FE*UN FE*WK 


F-Statisti# #*805344 Prob. F (3,1279) #>4909 

Log likelihood ratio 2.43262® Prob. chi-square (3) #>4176 


We do not reject the null hypothesis that the interaction between female and non-white, 
female and union, and female and weekly wage earners, collectively, has no significant 
impact on the estimated model given in Table 13.4, for the estimated F value of 0.8053 is 
not statistically significant, the p value being about 49 percent. 

We leave it for the reader to try other combinations of the regressors to assess their 
contribution to the original model. 

Before proceeding further, the model (13.11.1) suggests that the influence of experience 
on log wages is linear, that is, holding other variables constant, the relative increase in wages 
(remember the regressand is in log form), remains the same for every year’s increase in job 
experience. This assumption may be true over some years of experience, but as basic labor 
economics suggests, as workers get older, the rate of wage increase decreases. To see if this 
is the case in our example, we added the squared experience term to our initial model and 
obtained the following results: 


TABLE 13.6 

E Views Results with 
Experience Squared 


Dependent Variable: LW 
Method: Least Squares 
Sample: 1-1,289 
jiSicluded observations: 1,289 


Coefficient, 


Prob. 
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0.079867 

0.036659 

-0.228848 
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o.ooo© 
o.ooaf 
o.ooa# 

J’U 0007 

o. oo-if ■ 
o.ooo# 
0 . 008 # 


F-squared 

#> 39f f' 1% 

Mean dependent var. 

2.342416 

Adjusted F-squared 

#:,39539f: 

S.D. dependent var. 

0.586356 

S.E. of regression 

#.455703 

Akaike info criterion 

1.272234 

Sum squared re#id- 

266.0186 

Schwarz criterion 

1.304269 

Log likelihood 

-811.9549 

Hannan-Quinn criter. 

1.284259 

F-statistic 

121.633||t§ 

Durbin-Watson stat. 

1.971753 

Prob. (f- statistic) 

#. 000000 




The squared experience term is not only negative but it is also highly statistically signifi¬ 
cant. It also accords with labor market behavior; over time, the rate of growth of wages 

slows down ( — = 0.0366 - 0.001 2EXp) . 
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We take this opportunity to discuss the Akaike and Schwarz criteria. Like R 2 , these are 
tests of the goodness of fit of the estimated model; the difference is that under the R 2 
criterion, the higher its value, the better the model explains the behavior of the regressand. 
On the other hand, under the Akaike and Schwarz criteria, the lower the value of these 
statistics, the better is the model. 

Of course, all these criteria are meaningful if we want to compare two or more models. 
Thus, if you compare the model in Table 13.4 with the model in Table 13.6, which has the 
experience-squared as an additional regressor, we see that the model in Table 13.6 is prefer¬ 
able to the one in Table 13.4 on the basis of the three criteria. 

Incidentally, note that in both models the R 2 values seem “low,” but such low values are typ¬ 
ically observed in cross-section data with a large number of observations. However, note that 
this “low” R 2 value is statistically significant, since in both models the computed F statistic is 
highly significant (recall the relationship between F and R 2 discussed in Chapter 8). 

Let us continue with the expanded model given in Table 13.6. Although the model looks 
satisfactory, let us explore a couple of points. First, since we are dealing with cross-section 
data, there is every chance that the model suffers from heteroscedasticity. So, we need to 
find out if this is the case. We applied several of the tests of heteroscedasticity discussed in 
Chapter 11 and found that the model does in fact suffer from heteroscedasticity. The reader 
should verify this assertion. 

To correct for the observed heteroscedasticity, we can obtain White s heteroscedasticity- 
consistent standard errors, which were discussed in Chapter 11. The results are given in the 
following table. 


TABLE 13.7 

EViews Results Using 
White’s Corrected 
STD Errors 


Dependent Variable: LW 
Method: Least Squares 
Sample: 1-1,289 
tfccluded observations: 1,289 

White's Heteroscedast J^tfcy-OiSjisistent Standard 8«fors 
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9.675724 
-8.882625 
-3.614 57 3 
6.668458 
jf .110051 
6.470218 


o.oo®! 
o.Qoai 
0 . 00 # 
0u 0000 

0.0003 

S.oooi 
0 . 00 #® 
0.0000 


R -squared 

0.399277 

Mean dependent var. 

2.342416 

Adjusted R-squared 

0.395996 

S.D. dependent var. 

0.586356 

S.E. of regressieKtA 

0.455703 

Akaike info criteria] 

i 1.272234 

Sum squared resid. 

266.0186 

Schwarz criterion 

1.304269 

Log likelihood 

-811.9549 

Hannan-Quinn criter. 

1.284259 

F-^tatistic 

12'. .6331 

Durbin-Watson stat. 

1.971753 | 

Prob. (F- statistic) 

0-. 000000 




As you would expect, there are some changes in the estimated standard errors, although 
this does not change the conclusion that all the regressors are important, both individually 
as well as collectively, in explaining the behavior of relative wages. 

Let us now examine if the error terms are normally distributed. The histogram of the 
residuals obtained from the model in Table 13.7 is shown in Figure 13.5. The Jarque-Bera 
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FIGURE 13.5 

A histogram of the 
residuals obtained 
from the regression in 
Table 13.7 



Series: RESID 
Sample: 1-1,289 
Observations: 1,289 


Mean 

Median 

Minimum 
Std. Dev. 
Skewness 
Kurtosis 

Jarque-Bera 

Probability 


-9.38e-09 

-0.850280 

-20.58590 

6.324574 

1.721323 

10.72500 

3841.617 

0.000000 


statistic rejects the hypothesis that the errors are normally distributed, for the JB statistic 
is high and the p value is practically zero: Note that for a normally distributed variable, the 
skewness and kurtosis coefficients are, respectively, 0 and 3. 

Now what? Our hypothesis testing procedure thus far has rested on the assumption that the 
disturbance, or error, term in the regression model is normally distributed. Does this mean 
that we cannot legitimately use the t and F tests to test hypotheses in our wage regression? 

The answer is no. As noted in the chapter, the OLS estimators are asymptotically normally 
distributed with the caveat noted in the chapter, namely that the error term has finite variance, 
is homoscedastic, and the mean value of the error term, given the values of the explanatory 
variables, is zero. As a result, we can continue to use the usual t and F tests, provided the 
sample is reasonably large. In passing it may be noted that we did not need the normality 
assumption to obtain OLS estimators. Even without the normality assumption the OLS esti¬ 
mators are best linear unbiased estimators (BLUE) under the Gauss-Markov assumptions. 

How large is a large sample? There is no definitive answer to this question, but the sam¬ 
ple size of 1,289 observations in our wage regression seems reasonably large. 

Are there any “outliers” in our wage regression? Some idea about this can be gleaned 
from the graph in Figure 13.6, which gives the actual and estimated values of the dependent 


FIGURE 13.6 

Residuals vs estimated 
values of the 
dependent variable, 

In wage 
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TABLE 13.8 


TABLE 13.9 
Results of Regression 
Equation (13.11.3) 


variable (In wage) and the residuals, which are the differences between the actual and es¬ 
timated values of the regressand. 

Although the mean value of the residuals is always zero (why?), the graph in Figure 13.6 
shows that there are several residuals that seem large (in absolute value) compared with the 
hulk of the residuals. It is possible that there are outliers in the data. We provide the raw 
statistics on the three quantitative variables in Table 13.8 to aid the reader in deciding 
whether there are indeed outliers. 


Sample: 1-1,289 


W 

EDU 

EXP 

Mean 

la .36585 

13.1450? 

18.78976 

Median 

10.08000 

12.0000#. 

18.00000 

Maximum 

64.08000 

20.00000 

56.000:00 

Minimum 

0.840000 

0.000000 

0.000000 

Std. Dev. 

7.896350 

2.813823 

11.66284 

Skewness 

1.848114 

-0.290381 

0.375669 

Kuiffcosis 

7.836565 

5.977464 

2.327946 

Jarque-Bera 

1990.134 

494.2552 

54.57664 

Probability 

0.000000 

0.000000 

0.000000 

Sum 

15939.58 

16944. if 

24220.00 j 

Sum Sq. Dev. 

80309.82 

10197.87 

175196.0 

Observations 

1,289 

1,289 

m,2S9 

2. Real Consumption Function for the United States, 

1947-2000 

In Chapter 10 we considered the consumption function for the U. S. for the years 1947-2000. 

The specific form of the consumption function we considered was: 


In TC, 

= fi\ + In YD, + (i 3 In W + ^Interest, + u, 

(13.11.3) 

Where TC, YD, W, and Interest are, respectively, total consumption expenditure, personal 

disposable income, 

wealth, and interest rate, all in real terms. The results based on our data 

are as follows: 




Method: Least 

Squares 



Sample: 1947- 

2000 



included observations: 54 




Coefficient : 

Std. Error t Statisti 

.c Prob. 

C 

-o.467i|®. 

0.042751 -10.93343 

0.00*0: 

LOG(YD) 

0.804873 

0.017498 45.99836 

0.0000 

LOS(WEALTH) 

0.301270: 

0.017593 gH.440‘60' 

| .0000: 

INTEREST 

0.002689 

0.000762 -3.52926.5 

0.0009 

R- squared 

0.999560 

Mean dependent var. 

7.826093 

Adjusted R-squared 0.999533 

S.D. dependent var. 

0.552368 

S.E. #f regressi'^tt. 0.011934 

Akaike info criterion 

-5.947703 

Sum squared resid. 0.00712:1 

Schwarz criterion 

5.800371 

Log likelihood 164.5880 

Hannan-Quinn .©liter. 

-5.890883 

F-statistc 

37832.59 

Durbin-Watson Stat. 

1.289219 

Prob. (F-state 

istic) Q.OOQOQO 
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Since TC, YD, and Wealth enter in logarithmic form, the estimated slope coefficients ofYD 
and Wealth are, respectively, income and wealth elasticities. As you would expect, these 
elasticities are positive and are highly statistically significant. Numerically, the income and 
wealth elasticities are about 0.80 and 0.20. The coefficient of the interest rate variable 
represents semielasticity (why?). Holding other variables constant, the results show that if 
the interest rate goes up by 1 percentage point, on average, real consumption expenditure 
goes down by about 0.27 percent. Note that the estimated semielasticity is also highly 
statistically significant. 

Look at some of the summary statistics. The R 2 value is very high, almost reaching 
100 percent. The F value is also highly statistically significant, suggesting that, not only 
individually, but also collectively, all the explanatory variables have a significant impact on 
consumption expenditure. 

The Durbin-Watson statistic, however, suggests that errors in the model are serially cor¬ 
related. If we consult the Durbin-Watson tables (Table D.5 in Appendix D), we see that for 
55 observations (the closest number to 54) and three explanatory variables, the lower and 
upper 5 percent critical d values are 1.452 and 1.681. Since the observed d in our example, 
1.2892, is below the lower critical d values, we may conclude that the errors in our con¬ 
sumption function are positively correlated. This should not be a surprising finding, for 
most time series regressions suffer from autocorrelation. 

But before we accept this conclusion, let us find out if there are any specification errors. 
As we know, sometimes autocorrelation may be apparent because we have omitted 
some important variables. To see if this is the case, we consider the regression obtained in 
Table 13.10. 


TABLE 13.10 


Dependent Variable: LTC 

Method: Least Squares 

Sample: 1947-2000 

Jiifluded observations: 54 


Coefficient 

Std. Error 

| staglfstic 

Prob. 

C 

2.689644 

0.566034 

4.751737 

l.ofS 

LYD 

0.512836 

0vi54O : '56 

9.487076 

1.0000 

LW 

0.205281 

0.074068 

2.771510 

0.0079 

INTEREST 

-0.0 Of 1-6 2 

Q.0006ff. 

-1.759143 

1.0848 

LYD*LW 

■0.039901 

0.0071*1 

5.587986 

0.0000 


R -squared 

0 . ‘ ■ ■ 

Mean dependent var. 

7.826093 

Adjusted R-squared 

0.999709 

S.D. dependent var. 

4.5523 68 

S.E. of regresslbii. 

0.00943:1 

Akaike Istilo cpifceri^a 

-6.403689 

Sum squared resid. 

0.004349 

Schwarz criterion 

-6.219524 

Log likelihood 

177.8996 

Hannan-Quinn ©letter. 

-6.332663 

F- statistic * 

45534.94 

DuSrijin-Watson Stat. 

jS|. 530268 

Prob. (F- statistic) 

O.OOOOOf 




The additional variable in this model is the interaction of the logs of disposable income 
and wealth. This interaction term is highly significant. Notice that now the interest variable 
has become less significant ( p value of about 8 percent), although it retains its negative sign. 
But now the Durbin-Watson d value has increased from about 1.28 to about 1.53. 

The 5 percent critical d values now are 1.378 and 1.721. The observed d value of 1.53 
lies between these values, suggesting that, on the basis of the Durbin-Watson statistic, we 
cannot determine whether or not we have autocorrelation. However, the observed d value is 







TABLE 13.11 
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closer to the upper limit d value. As noted in the chapter on autocorrelation, some authors 
suggest using the upper limit of the d statistic as approximately the true significance limit; 
therefore, if the computed d value is below the upper limit, there is evidence of positive au¬ 
tocorrelation. By that criterion, in the present instance we can conclude that our model suf¬ 
fers from positive autocorrelation. 

We also applied the Breusch-Godffey test of autocorrelation that we discussed in Chap¬ 
ter 12. Adding the two lagged terms of the estimated residuals in Equation (12.6.15) to the 
model in Table 13.9, we obtained the following results: 


Breusch-Godfrey 

Serial Correlation LM Test: 


F-status tie' 

3.254131 

Prob. F(2,48) 

0.0473 

Obs*R-squared 

6.447576 

Prob. chi-square (2) 

0.0398 


Dependent Variable: RESID 
Method: Least Squares 
Sample: 1947-2000 
Included observations: 54 


Presaarple missing vc 

tlue lagged 

residuals | 

iijpb, to zero. 


Coefficient ■ Std. Error 

t Statistic 

Prob. 

C -0,§©6514 a 

.041528 

-0.156851 

0.8760 

r,YD -0,004197 f> 

.017158 

-0.244618 

0.8078 

LW 0,004191 ffc 

.017271 

0.242674 

0.8093 

INTEREST 0.000116 1; 

.000336 

0.156964 

0.8753 

RESID(-l) 0.385190 §. 

.151581 

2.541147 

#,©143 

RESID(-2) -0.165609 S ; 

.154695 

-1.070556 

0.2897 

R-squared 

0. 1194©#' - 

Mean dependent var. 

9.02K 17 ] 

Adjusted R~squared 

0.027670 

S.D. dependent var. 

0.011591 

S.E. of regression! 

0.01143111 

Akaika info criterion. 

-6.#§©781 

Sum squared resid. 

0.00627|gg 

Schwarz esfiterio® 

-5.779782 

Log likelihood 

'.68.021.4- 

Hannan-Gain® ofiter. 

-5.91555f- 

F-sii&kistl|s 

1.301653 

Durbin-Watson Stat. 

1.848014 

Prob. (F-statistic) 

0.27904# 





The F reported at the top tests the hypothesis that the two lagged residuals included in the 
model have zero values. This hypothesis is rejected because the F is significant at about the 
5 percent level. 

To sum up, it seems that there is autocorrelation in the error term. We can apply one or 
more procedures discussed in Chapter 12 to remove autocorrelation. But to save space, we 
leave that task to the reader. 

In Table 13.12 we report the results of regression analysis that present the HAC or 
Newey-West standard errors that take into account the autocorrelation. Our sample size of 
54 observations is large enough to use the HAC standard errors. 

If you compare these results with those given in Table 13.9, you will observe that the 
regression coefficients remain the same, but that the standard errors are somewhat different. 

In this chapter we discussed Chow’s prediction failure test. We have a sample period that 
extends from 1947 to 2000. Over this period, we have had several business cycles, mostly 
of short durations. For example, there was a recession in 1990 and another one in 2000. Is 
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TABLE 13.12 


TABLE 13.13 

Chow’s Test of 
Predictive Failure 


Dependent Variable: Hg 
Method: Least Squares 
Sample: 1947-2000 
.r-ftfeicluded observations: 54 
Newey-West HAC Standard Errors and Covariance (lag 
truncation. = 3) 


Coefficient Std. Error 


: Statistic Prob. 


-0.467714 
0.804871 
0 .M12V2 
-0.002689 


0.043937 
0.017117 
0.015447 
0.00:08 8 


-10.64516 

47.02132 

13.02988 

-3.056306 


o.jyn 

0.0 09 ®. 
0.009® 
0.0036 


R -squared 

0.9 9 956® 

Mean dependent var. 

7.826093 

Adjusted R-squared 

0.999533 

S.D. dependent var. 

0.552368 

S.E. of regressl®|. 

0.01,1934 

Akaike jSjlla cr.i tor i on 

-5.947707 

Sum squared res.: a . 

0.007121 

Schwarz criterion 

-5.800ff4 

Log likelihood 

164.5881 

Hannan-Quinn criter. 

-5.890886 

F-statistic 

37832.71 

Durbin-Watson Stat. 

1.28923:? 

Prob. (F-statitttic) 

0.000000 




the behavior of consumer expenditure in relation to income, wealth, and the interest rate 
different during recessions? 

To shed light on this question, let us consider the 1990 recession and apply Chow’s predic¬ 
tion failure test. The details of this test have already been discussed in the chapter. Using 
Chow’s predictive failure test in EViews, version 6, we obtain the results given in Table 13.13. 


Chow's Forecast Test: Forecast from 1991 to 3’§® : 0 


F-statistic 

Log likelihood ratio 


1 ,®:f ?74§ 

21.51348 


Prob. F (10,40) 

Prob. chi-square (10) 


Q. iI32 

o. iris 


Dependent Variable: LTC 
Method: Least Squares 
Sample: 1947-1990 
.tftocluded observations: 4' 


Coeffi i 


Std. Error 


Prob. 


•0.287952 
fh853172 
0.141513 
0.002060 


0 ...fi;SB089 : 
0.028473 
0.033085 
0.000804 


-3.028236 
29.96474 
4 ..af'?239 
-2.562790 


0.0043 
' f|0000 
0.0001 
0.0143 


R-squared 

0.999496 

Mean dependent var. 

V.659729 

Adjusted R--squared 

0.939458 

:S..D. dependent var. 

0.46938^. 1 

S.E. of regression: 

0.010933 

Akaike in'fc criterion 

-6.10764® 

Sum squared resid,. 

0.00478® 

Schwarz criterion 

-5.94544^| 

Log likelihood 

138.368p 

Hannan-Quinn ciftiter. 

-6.047489 

F-statistic 

26430.49 

Durbin-Watson Stat. 

1.262748 

Prob. (F-statistie) 

o.ooooofl 
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The F statistic given in the top portion of Table 13.13 suggests that there probably is not 
a substantial difference in the consumption function pre- and post-1990, for its p value is 
not significant at the 5 percent level. But if you choose the 10 percent level of significance, 
the F value is statistically significant. 

We can look at this problem differently. In Chapter 8 we discussed a test of parameter 
stability. To see if there has been any statistically significant change in the consumption 
function regression coefficients, we used the Chow test discussed in Section 8.7 of Chapter 8 
and obtained the results given in Table 13.14. 


Parameter Stability 


Apparently, it seems that the consumption function pre- and post-1990 are statistically 
different, for the computed F statistic, following Eq. (8.7.4), is highly statistically signifi¬ 
cant because the p value is only 0.0052. 

The reader is encouraged to apply Chow’s parameter stability and predictive failure tests 
to determine if the consumption function pre- and post-2000 has changed. To do this, you 
will have to extend the data beyond 2000. Also note that to apply these tests the number of 
observations must be greater than the number of coefficients estimated. 

We have exhausted all of the diagnostic tests that we can apply to our consumption data. 
But the analysis provided thus far should give you a fairly good idea about how one can 
apply the various tests. 

13.12 Non-Normal Errors and Stochastic Regressors 

In this section we discuss two topics that are of a somewhat advanced nature, namely, 
non-normal distribution of the error term, and stochastic, or random, regressors and their 
practical importance. 

1. What Happens If the Error Term Is Not Normally Distributed? 

In the classical normal linear regression model (CNLRM) discussed in Chapter 4, we 
assumed that the error term u follows the normal distribution. We invoked the central limit 
theorem (CLT) to justify the normality assumption. Because of this assumption, we were 
able to establish that the OLS estimators are also normally distributed. As a result, we were 
able to do hypothesis testing using the t and F tests regardless of the sample size. We also 
discussed using the Jarque-Bera and Anderson-Darling normality tests to find out if the 
estimated errors are normally distributed in any practical application. 

What happens if the errors are not normally distributed? It can be stated that the OLS 
estimators are still BLUE, that is, they are unbiased and in the class of linear estimators 
they show minimum variance. Intuitively, this should not be surprising, for to establish the 
Gauss-Markov (BLUE) theorem we did not need the normality assumption. 

Then what is the problem? 

The problem is that we need the sampling, or probability, distributions of the OLS 
estimators. Without that we cannot engage in any kind of hypothesis testing regarding the true 
values of these estimators. As shown in Chapters 3 and 7, the OLS estimators are linear 


Chow Breakpoint Test: 199® 

Null Hypothesis: No breaks at specified breakpoints 
Varyiua regressors: All equation variables 
Equation Sample,* 1947-2000 


F-statistic 4.254054 Prob. F(4,46) 0.0052 
Log likelihood ratio 16.99654 Prob. chi-square (4) 0.0019 
Wald statistic 17.01622 Prob. chi-square (4) 0.0019 


TABLE 13.14 

Chow’s Test of 
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funtions of the dependent variable Y, and Y itself is a linear function of the stochastic error 
term u, assuming that the explanatory variables are non-stochastic, or fixed in repeated 
sampling. Ultimately, then, we need the probability distribution of u. 

As noted above, the classical normal linear regression model (CNLRM) assumes that 
the error term follows the normal distribution (with zero mean and constant variance). 
Using the central limit theorem (CLT) to justify the normality of the error term, we were 
able to show that the OLS estimators themselves are normally distributed with means and 
variance discussed in Chapters 4 and 7. This in turn allowed us to use the t and F statistics 
in hypothesis testing in small, or finite, samples as well as in large samples. Therefore, the 
role of the normality assumption is very critical, especially in small samples. 

But what if we cannot maintain the normality assumption on the basis of various nor¬ 
mality tests? What then? We have two choices. The first is bootstrapping and the second is 
to invoke large, or asymptotic, sample theory. 

A discussion of bootstrapping, which is gradually seeping into applied econometrics, 
will take us far afield. The basic idea underlying bootstrapping is to churn (or regurgitate) 
a given sample over and over again and then obtain the sampling distributions of the para¬ 
meters of interest (OLS estimators for our purpose). How this is done in practice is best left 
for references. 52 By the way, the term bootstrapping comes from the commonly used ex¬ 
pression, “to pull oneself up by one’s own bootstrap.” 

The other approach to deal with non-normal error terms is to use asymptotic, or large 
sample theory. As a matter of fact, a glimpse of this was given in Appendix 3A.7 in Chap¬ 
ter 3, where we showed that the OLS estimators are consistent. As discussed in Appendix A, 
an estimator is consistent if it approaches the true value of the estimator as the sample size 
gets larger and larger (see Figure A. 11 in Appendix A). 

But how does that help us in hypothesis testing? Can we still use the t and F tests? It can 
he shown that under the Gauss-Markov assumptions the OLS estimators are asymptotically 
normally distributed with the means and variances discussed in Chapters 4 and 7. 53 As a 
result, the t and F tests developed under the normality assumption are approximately valid 
in large samples. The approximation becomes quite good as the sample size increases. 54 

2. Stochastic Explanatory Variables 

In Chapter 3 we introduced the classical linear (in parameter) regression model under some 
simplifying assumptions. One of the assumptions was that the explanatory variables, or 
regressors, were either fixed or non-stochastic, or if stochastic, they were independent of 
the error term. We called the former case the fixed regressor case and the latter the random 
regressor case. 


52 For an informal discussion, see Christopher Z. Mooney and Robert D. Duval, Bootstrapping: A 
Nonparametric Approach to Statistical Inference, Sage University Press, California, 1993. For a more 
formal textbook discussion, see Russell Davidson and James C. MacKinnon, Econometric Theory and 
Methods, Oxford University Press, New York, 2004, pp. 159-166. 

53 Recall the Gauss-Markov assumptions, namely, the expected value of the error term is zero, 
the error term and each of the explanatory variables are independent, the error variance is 
homoscedastic, and there is no autocorrelation in the error term. It is also assumed that the 
variance-covariance matrix of the explanatory variables is finite. We can also relax the condition of 
independence between the error term and the regressors and assume the weaker condition that they 
are uncorrelated. 

54 The proof of asymptotic normality of OLS estimators is beyond the scope of this book. See James H. 
Stock and Mark W. Watson, Introduction to Econometrics, 2d ed., Pearson/Addison Wesley, Boston, 
2007, pp. 710-711. 
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In the fixed regressor case, we already know the properties of the OLS estimators (see 
Chapters 5 and 8). In the random regressor case, if we proceed with the assumption that our 
analysis is conditional on the given values of the regressors, the properties of OLS estima¬ 
tors that we have studied under the fixed regressor case continue to hold true. 

If in the random regressor case we assume that these regressors and the error term are 
independently distributed, the OLS estimators are still unbiased hut they are no longer 
efficient. 55 

Things get complicated if the error term is not normally distributed, or regressors are 
stochastic, or both. Here it is difficult to make any general statements regarding the finite- 
sample properties of the OLS estimators. However, under certain conditions, we can invoke 
the central limit theorem to establish the asymptotic normality of OLS estimators. Although 
beyond the scope of this book, the proofs can be found elsewhere. 56 


13.13 A Word to the Practitioner 


We have covered a lot of ground in this chapter. There is no question that model building is 
an art as well as a science. A practical researcher may be bewildered by theoretical niceties 
and an array of diagnostic tools. But it is well to keep in mind Martin Feldstein’s caution 
that “The applied econometrician, like the theorist, soon discovers from experience that a 
useful model is not one that is ‘true’ or ‘realistic’ but one that is parsimonious, plausible 
and informative.” 57 

Peter Kennedy of Simon Fraser University in Canada advocates the following “Ten 
Commandments of Applied Econometrics”: 58 

1. Thou shalt use common sense and economic theory. 

2. Thou shalt ask the right questions (i.e., put relevance before mathematical elegance). 

3. Thou shalt know the context (do not perform ignorant statistical analysis). 

4. Thou shalt inspect the data. 

5. Thou shalt not worship complexity. Use the KISS principle, that is, keep it stochasti¬ 
cally simple. 

6. Thou shalt look long and hard at thy results. 

7. Thou shalt beware the costs of data mining. 

8. Thou shalt be willing to compromise (do not worship textbook prescriptions). 

9. Thou shalt not confuse significance with substance (do not confuse statistical signifi¬ 
cance with practical significance). 

10. Thou shalt confess in the presence of sensitivity (that is, anticipate criticism). 

You may want to read Kennedy’s paper fully to appreciate the conviction with which he 
advocates the above ten commandments. Some of these commandments may sound 
tongue-in-cheek, but there is a grain of truth in each. 


55 For technical details, see William H. Greene, Econometric Analysis, 6th ed., Pearson/Prentice-Hall, 
New Jersey, 2008, pp. 49-50. 

56 See Greene, op. cit. 

57 Martin S. Feldstein, "Inflation, Tax Rules and Investment: Some Econometric Evidence," 
Econometrica, vol. 30, 1982, p. 829. 

58 Peter Kennedy, op. cit., pp. 17-18. 
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Summary and 
Conclusions 


1. The assumption of the CLRM that the econometric model used in analysis is correctly 
specified has two meanings. One, there are no equation specification errors, and two, 
there are no model specification errors. In this chapter the major focus was on equation 
specification errors. 

2. The equation specification errors discussed in this chapter were (1) omission of an im¬ 
portant variable(s), (2) inclusion of a superfluous variable(s), (3) adoption of the 
wrong function form, (4) incorrect specification of the error term u„ and (5) errors of 
measurement in the regressand and regressors. 

3. When legitimate variables are omitted from a model, the consequences can be very 
serious: The OLS estimators of the variables retained in the model are not only bi¬ 
ased but inconsistent as well. Additionally, the variances and standard errors of 
these coefficients are incorrectly estimated, thereby vitiating the usual hypothesis¬ 
testing procedures. 

4. The consequences of including irrelevant variables in the model are fortunately less 
serious: The estimators of the coefficients of the relevant as well as “irrelevant” vari¬ 
ables remain unbiased as well as consistent, and the error variance a 2 remains cor¬ 
rectly estimated. The only problem is that the estimated variances tend to be larger than 
necessary, thereby making for less precise estimation of the parameters. That is, the 
confidence intervals tend to be larger than necessary. 

5. To detect equation specification errors, we considered several tests, such as (1) exam¬ 
ination of residuals, (2) the Durbin-Watson d statistic, (3) Ramsey’s RESET test, and 
(4) the Lagrange multiplier test. 

6. A special kind of specification error is errors of measurement in the values of the 
regressand and regressors. If there are errors of measurement in the regressand only, 
the OLS estimators are unbiased as well as consistent but they are less efficient. If 
there are errors of measurement in the regressors, the OLS estimators are biased as 
well as inconsistent. 

7. Even if errors of measurement are detected or suspected, the remedies are often not 
easy. The use of instrumental or proxy variables is theoretically attractive but not 
always practical. Thus it is very important in practice that the researcher be careful in 
stating the sources of his/her data, how they were collected, what definitions were used, 
etc. Data collected by official agencies often come with several footnotes and the 
researcher should bring those to the attention of the reader. 

8. Model mis-specification errors can be as serious as equation specification errors. In 
particular, we distinguished between nested and non-nested models. To decide on the 
appropriate model we discussed the non-nested, or encompassing, F test and the 
Davidson-MacKinnon Jtest and pointed out the limitations of each test. 

9. In choosing an empirical model in practice researchers have used a variety of criteria. 
We discussed some of these, such as the Akaike and Schwarz information criteria, 
Mallows’s C p criterion, and forecast / 2 criterion. We discussed the advantages and 
disadvantages of these criteria and also warned the reader that these criteria are not 
absolute but are adjunct to a careful specification analysis. 

10. We also discussed these additional topics: (1) outliers, leverage, and influence; 
(2) recursive least squares; and (3) Chow’s prediction failure test. We discussed the 
role of each in applied work. 

11. We discussed briefly two special cases, namely, non-normality of the stochastic error term 
and random regressors and the role of asymptotic, or large, sample theory in situations 
where small, or finite, sample properties of OLS estimators canot be established. 



EXERCISES 
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12. We concluded this chapter by discussing Peter Kennedy’s “ten commandments of 
applied econometrics.” The point of these commandments is to ask the researcher to 
look beyond the purely technical aspects of econometrics. 


Questions 

13.1. Refer to the demand function for chicken estimated in Eq. (8.6.23). Considering the 
attributes of a good model discussed in Section 13.1, could you say that this de¬ 
mand function is “correctly” specified? 

13.2. Suppose that the true model is 

Y i = p x X i +u i (1) 

but instead of fitting this regression through the origin you routinely fit the usual 
intercept-present model: 

1 =a 0 +a l X i +v i (2) 

Assess the consequences of this specification error. 

13.3. Continue with Exercise 13.2 but assume that it is model (2) that is the truth. Discuss 
the consequences of fitting the mis-specified model (1). 

13.4. Suppose that the “true” model is 

Y i =p 1 +p 2 X 2i +u t (1) 

but we add an “irrelevant” variable X 2 to the model (irrelevant in the sense that the 
true /J3 coefficient attached to the variable X 2 is zero) and estimate 

Yi I ft + fhX 2l + foX M + v, (2) 

a. Would the R 2 and the adjusted R 2 for model (2) be larger than that for model (1)? 

b. Are the estimates of ft and ft obtained from model (2) unbiased? 

c. Does the inclusion of the “irrelevant” variable X 3 affect the variances of ft and ft? 

13.5. Consider the following “true” (Cobb-Douglas) production function: 

In Yj = oto + «i In L i, + ci2 In L 2i + a 2 In K, + Ui 

where Y — output 

L\ — production labor 
ft = nonproduction labor 
K — capital 

But suppose the regression actually used in empirical investigation is 
In Yi — fi o + ft In L j+ @2 In Aj + w, 

On the assumption that you have cross-sectional data on the relevant variables, 

a. Will E(fii) = «i and E(j3 2 ) = « 3 ? 

b. Will the answer in (a) hold if it is known that L 2 is an irrelevant input in the pro¬ 
duction function? Show the necessary derivations. 

13.6. Refer to Eqs. (13.3.4) and (13.3.5). As you can see, a 2 , although biased, has a 
smaller variance than fi 2 , which is unbiased. How would you decide on the trade-off 
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between bias and smaller variance? Hint: The MSE (mean-square error) for the two 
estimators is expressed as 

MSE(« 2 ) = (a 2 /^4)+^ 2 

= sampling variance + square of bias 
MSE0§ 2 ) = a 2 /£4(l-4) 

On MSE, see Appendix A. 

13.7. Show that fi estimated from either Eq. (13.5.1) orEq. (13.5.3) provides an unbiased 
estimate of true p. 

13.8. Following Friedman’s permanent income hypothesis, we may write 

Y* = a + pX* (1) 

where Y* = “permanent” consumption expenditure and X* — “permanent” income. 
Instead of observing the “permanent” variables, we observe 

Yt = Y* + m 
X t = X* + Vi 

where Y, and X, are the quantities that can be observed or measured and where w, 
and Vi are measurement errors in Y* and X* , respectively. 

Using the observable quantities, we can write the consumption function as 

Yi=a + P(Xi - v ; ) + «,• 

= a + pXi + (Mi - y8Vi) (2) 

Assuming that (1) E(ut) = E(vi) = 0, (2) var («,•) = <r 2 and var (v,) = er 2 , 
(3) cov(Y*, ut) = 0, cov (X*, v,) = 0, and (4) cov (m„ X*) = cov(v„ Y*) = 
cov(m ; , v,) = 0, show that in large samples jd estimated from Eq. (2) can be expressed as 


plim(id) = 


— 

1 + ( 4 / 4 *) 


a. What can you say about the nature of the bias in fil 

b. If the sample size increases indefinitely, will the estimated fi tend toward equal¬ 
ity with the true /3? 

13.9. Capital asset pricing model. The capital asset pricing model (CAPM) of modem 
investment theory postulates the following relationship between the average rate of 
return of a security (common stock), measured over a certain period, and the volatil¬ 
ity of the security, called the beta coefficient (volatility is measure of risk): 

Ri= a] +a 2 (A) + Ui ( 1 ) 


= average rate of return of security i 


Pi = true beta coefficient of security i 


ut — stochastic disturbance term 


The true P, is not directly observable but is measured as follows: 


(2) 
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where r it = rate of return of security i for time t 

r mi — market rate of return for time t (this rate is the rate of return on some 
broad market index, such as the S&P index of industrial securities) 
e t — residual term 

and where ft* is an estimate of the “true” beta coefficient. In practice, therefore, 
instead of estimating Eq. (1), one estimates 

Ri = ofi + ot 2 (j8?) + Ui (3) 

where fi* are obtained from the regression (2). But since P* are estimated, the rela¬ 
tionship between true P and fi* can be written as 

P* = Pi+v, (4) 


where v,- can be called the error of measurement. 

a. What will be the effect of this error of measurement on the estimate of a2? 

b. Will the «2 estimated from Eq. (3) provide an unbiased estimate of true a2? If not, 
is it a consistent estimate of a 2 ? If not, what remedial measures do you suggest? 

13.10. Consider the model 

Yi = p x + p 2 X 2i + Ui (1) 

To find out whether this model is mis-specified because it omits the variable X 3 
from the model, you decide to regress the residuals obtained from model (1) on the 
variable W 3 only. {Note: There is an intercept in this regression.) The Lagrange mul¬ 
tiplier (LM) test, however, requires you to regress the residuals from model (1) on 
both X2 and X 2 and a constant. Why is your procedure likely to be inappropriate?* 

13.11. Consider the model 

Yi = Pi + PiX* + Ui 

In practice we measure X* by X l such that 

a. Xi=X* + 5 

b. Xi = 3X* 

c. X t = (X* + e^, where e, is a purely random term with the usual properties 
What will be the effect of these measurement errors on estimates of true Pi and fi 2 ! 

13.12. Refer to the regression Eqs. (13.3.1) and (13.3.2). In a manner similar to Eq. (13.3.3) 
show that 

E{ai) = Pi+ p 3 (X 3 - b i2 X 2 ) 

where b 2 2 is the slope coefficient in the regression of the omitted variable X 3 on the 
included variable X 2 . 

13.13. Critically evaluate the following view expressed by Learner: ' 

My interest in metastatistics [i.e., theory of inference actually drawn from data] stems 
from my observations of economists at work. The opinion that econometric theory is 


"See Maddala, op. cit., p. 477. 

^Edward E. Learner, Specification Searches: Ad Hoc Inference with Nonexperimentai Data, John Wiley Sc 
Sons, New York, 1978, p. vi. 
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irrelevant is held by an embarrassingly large share of the economic profession. The 
wide gap between econometric theory and econometric practice might be expected to 
cause professional tension. In fact, a calm equilibrium permeates our journals and our 
[professional] meetings. We comfortably divide ourselves into a celibate priesthood of 
statistical theorists, on the one hand, and a legion of inveterate sinner-data analysts, on 
the other. The priests are empowered to draw up lists of sins and are revered for the 
special talents they display. Sinners are not expected to avoid sins; they need only con¬ 
fess their errors openly. 

13.14. Evaluate the following statement made by Henry Theil:* 

Given the present state of the art, the most sensible procedure is to interpret confi¬ 
dence coefficients and significance limits liberally when confidence intervals and 
test statistics are computed from the final regression of a regression strategy in the 
conventional way. That is, a 95 percent confidence coefficient may actually be an 
80 percent confidence coefficient and a 1 percent significance level may actually be a 
10 percent level. 

13.15. Commenting on the econometric methodology practiced in the 1950s and early 
1960s, Blaug stated: 1 ' 

. . . much of it [i.e., empirical research] is like playing tennis with the net down: 
instead of attempting to refute testable predictions, modem economists all too fre¬ 
quently are satisfied to demonstrate that the real world conforms to their predictions, 
thus replacing falsification [a la Popper], which is difficult, with verification, which 
is easy. 

Do you agree with this view? You may want to peruse Blaug’s book to learn more 
about his views. 

13.16. According to Blaug, “There is no logic of proof but there is logic of disproof.”* 
What does he mean by this? 

13.17. Refer to the St. Louis model discussed in the text. Keeping in mind the problems 
associated with the nested F test, critically evaluate the results presented in regres¬ 
sion (13.8.4). 

13.18. Suppose the true model is 

Yi = Pi + foXt + p 2 Xf + p 3 X] + u t 

but you estimate 

Y, = a\ + a 2 X t + \>i 

If you use observations of Y at X = —3, —2, —1, 0, 1, 2, 3, and estimate the 
“incorrect” model, what bias will result in these estimates? 11 

13.19. To see if the variable Xf belongs in the model Y, = P\ + p 2 X t + u,, Ramsey’s 
RESET test would estimate the linear model, obtaining the estimated T, values from 
this model [i.e., % — P\+ p 2 X t ] and then estimating the model Y t — a\ + a 2 X t + 
a 3 Yf + Vi and testing the significance of a 3 . Prove that, if a 3 turns out to be statisti¬ 
cally significant in the preceding (RESET) equation, it is the same thing as estimating 

‘Henry Theil, Principles of Econometrics, John Wiley Sc Sons, New York, 1971, pp. 605-606. 

Blaug, The Methodology of Economics. Or How Economists Explain, Cambridge University Press, 

New York, 1980, p. 256. 

ffbid., p. 14. 

^Adapted from C. A. F., Sebeir, Linear Regression Analysis, John Wiley Sc Sons, New York, 1977, p. 176. 
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the following model directly: J| = fo + fi 2 Xi + foX] + u,. (Hint: Substitute for 
Yj in the RESET regression.)* 

13.20. State with reason whether the following statements are true or false. 1 ' 

a. An observation can be influential but not an outlier. 

b. An observation can be an outlier but not influential. 

c. An observation can be both influential and an outlier. 

d. If in the model Y t = ft i + foX, + foXf + info turns out to be statistically 
significant, we should retain the linear term X t even if fo is statistically 
insignificant. 

e. If you estimate the model E — fo + foX 2 , + foX 3i + u, or E = «i + foxn + 
fox 3l +ui by OLS, the estimated regression line is the same, where x 2 ; = 
(X 2l - X 2 ) and x 3 , = (X 3l - X 3 ). 

Empirical Exercises 

13.21. Use the data for the demand for chicken given in Exercise 7.19. Suppose you are 
told that the true demand function is 

InE = fo+fo \nX 2t +fo\nX 3t +fo\n X 6t + u, (1) 

but you think differently and estimate the following demand function: 

In T, = a, + a 2 In X 2t + a 3 In X 3t + v t (2) 

where Y — per capita consumption of chickens (lb) 

X 2 = real disposable per capita income 

X 3 = real retail price of chickens 

X 6 = composite real price of chicken substitutes 

a. Carry out RESET and LM tests of specification errors, assuming the demand 
function (1) just given is the truth. 

b. Suppose fo in Eq. (1) turns out to be statistically insignificant. Does that mean 
there is no specification error if we fit Eq. (2) to the data? 

c. If fo turns out to be insignificant, does that mean one should not introduce the 
price of a substitute product(s) as an argument in the demand function? 

13.22. Continue with Exercise 13.21. Strictly for pedagogical purposes, assume that 
model (2) is the true demand function. 

a. If we now estimate model (1), what type of specification error is committed in 
this instance? 

b. What are the theoretical consequences of this specification error? Illustrate with 
the data at hand. 

13.23. The true model is 

Y* =fo+ foX* + Ui (1) 

but because of errors of measurement you estimate 

Y i = cn+a 2 X i +v i (2) 

where Y, — Y* + s, and X t — X* + w,-, where £,- and w t are measurement errors. 

‘Adapted from Kerry Peterson, op. cit., pp. 184-185. 

Adapted from Norman R. Draper and Harry Smith, op. cit., pp. 606-607. 
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Using the data given in Table 13.2, document the consequences of estimating 
model (2) instead of the true model (1). 

13.24. Monte Carlo experiment* Ten individuals had weekly permanent income as fol¬ 
lows: $200, 220, 240, 260, 280, 300, 320, 340, 380, and 400. Permanent consump¬ 
tion (Y*) was related to permanent income X* as 

Y* = 0.8 A* (1) 

Each of these individuals had transitory income equal to 100 times a random num¬ 
ber Uj drawn from a normal population with mean = 0 and a 2 — 1 (i.e., standard 
normal variable). Assume that there is no transitory component in consumption. 
Thus, measured consumption and permanent consumption are the same. 

a. Draw 10 random numbers from a normal population with zero mean and unit 
variance and obtain 10 numbers for measured income X, (= X* + 1 QOu,). 

b. Regress permanent (= measured) consumption on measured income using the 
data obtained in (a) and compare your results with those shown in Eq. (1). A 
priori, the intercept should be zero (why?). Is that the case? Why or why not? 

c. Repeat (a) 100 times and obtain 100 regressions as shown in (b) and compare 
your results with the true regression (1). What general conclusions do you draw? 

13.25. Refer to Exercise 8.26. With the definitions of the variables given there, consider 
the following two models to explain Y : 

Model A: Y t =a i + a%X^ + a^X 4t + a 4 X 6t + u t 
Model B: Y, = + P^Xjt + fhX$ t + PaX& + u, 

Using the nested F test, how will you choose between the two models? 

13.26. Continue with Exercise 13.25. Using the / test, how would you decide between the 
two models? 

13.27. Refer to Exercise 7.19, which is concerned with the demand for chicken in the 
United States. There you were given five models. 

a. What is the difference between model 1 and model 2? If model 2 is correct and 
you estimate model 1, what kind of error is committed? Which test would you 
apply—equation specification error or model selection error? Show the neces¬ 
sary calculations. 

b. Between models 1 and 5, which would you choose? Which test(s) do you use 
and why? 

13.28. Refer to Table 8.11, which gives data on personal savings (7) and personal dispos¬ 
able income ( X) for the period 1970-2005. Now consider the following models: 

Model A: Y, — a i + a2X t + a^X,^ + u t 
Model B: Y t - fit + faX t 4- ft I'm + u t 

How would you choose between these two models? State clearly the test proce¬ 
dure^) you use and show all the calculations. Suppose someone contends that the 
interest rate variable belongs in the savings function. How would you test this? 
Collect data on the 3-month treasury bill rate as a proxy for the interest and demon¬ 
strate your answer. 


‘Adapted from Christopher Dougherty, Introduction to Econometrics, Oxford University Press, 
New York, 1992, pp. 253-256. 
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13.29. Use the data in Exercise 13.28. To familiarize yourself with recursive least squares, es¬ 
timate the savings functions for 1970-1981,1970-1985,1970-1990, and 1970-1995. 
Comment on the stability of estimated coefficients in the savings functions. 

13.30. Continue with Exercise 13.29, but now use the updated data in Table 8.10. 

a. Suppose you estimate the savings function for 1970-1981. Using the parameters 
thus estimated and the personal disposable income data from 1982-2000, esti¬ 
mate the predicted savings for the latter period and use Chow’s prediction failure 
test to find out if it rejects the hypothesis that the savings function between the 
two time periods has not changed. 

b. Now estimate the savings function for the data from 2000-2005. Compare the 
results to the function for the 1982-2000 period using the same method as above 
(Chow’s prediction failure test). Is there a significant change in the savings func¬ 
tion between the two periods? 

13.31. Omission of a variable in the K-variable regression model. Refer to Eq. (13.3.3), 
which shows the bias in omitting the variable X3 from the model 7,- = Pi+ 
@2X21 + foXu +Ui. This can be generalized as follows: In the £-variable model 

Yj = Pi + fcXu -\ -1- fikXki + Ui , suppose we omit the variable X k . Then it can 

be shown that the omitted variable bias of the slope coefficient of included variable 
Xj is: 

E(Pj) = ft + p k b kj j = 2, 3,..., (k- 1) 
where b k j is the (partial) slope coefficient of Xj in the auxiliary regression of the 
excluded variable X k on all the explanatory variables included in the model.* 

Refer to Exercise 13.21. Find out the bias of the coefficients in Eq. (1) if we 
excluded the variable In X(, from the model. Is this exclusion serious? Show the 
necessary calculations. 


Appendix 1 3 A 


13A.1 The Proof that £(bi 2 ) = /? 2 + 
[Equation (13.3.3)] 


In the deviation form the three-variable population regression model can be written as 

» = PlX2i + P3X3i + (M; - U) (1) 

First multiplying by x 2 and then by xj, the usual normal equations are 

^ytm = P 2 J2 x 2i + A X/ 2 ' X3i +J2 X2, ( Ui ~ “) ( 2 ) 

= P2 X2iX3i + ft x ii + X3i ( U ‘ - “) ( 3 ) 


Dividing Eq. (2) by J2 x li on both sides, we obtain 


£4 


= >32 + >03 


I> 2 jX 3 j 

£4 


£ x 2i (w i - U ) 

£4 


(4) 


'This can be generalized to the case where more than one relevant X variable is excluded from the 
model. On this, see Chandan Mukherjee et at, op. cit., p. 215. 
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Now recalling that 


Eq. (4) can be written as 


*12 


XM 


= A + 32+ W> ( 5 ) 

Taking the expected value of Eq. (5) on both sides, we finally obtain 

E(bl 2 ) = P2 + fob 3 2 (6) 

where use is made of the facts that (a) for a given sample, 632 is a known fixed quantity, (b) 02 and 
(63 are constants, and (c) u t is uncorrelated with X2 ; (as well as X 3i ). 


13A.2 The Consequences of Including an Irrelevant 
Variable: The Unbiasedness Property 


For the true model (13.3.6), we have 



and we know that it is unbiased. 

For the model (13.3.7), we obtain 

„ (X>* 2 )(l>3 2 ) - (x>*3)(x>2* 3 ) 

X>2 2 X>3 2 -(X>* 3 ) 2 

Now the true model in deviation form is 


§f = 02X2 +(Ui- U) 


Substituting for y t from model (3) into model (2) and simplifying, we obtain 


*(02) = 02 


x>f x>f - (x> 2 * 3 ) 2 

3 ) 2 


that is, ct2 remains unbiased. 
We also obtain 


= 02 


(X>*3)(X>j) ~ (X>*2)(X>2* 3 ) 
T.4T.4 - (T.X2X3) 


( 1 ) 


( 2 ) 


(3) 


(4) 


( 5 ) 
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Substituting for >',■ from model (3) into model (5) and simplifying, we obtain 
E(a 3 ) = f$2 -^4^ 

X>f X>3 2 - (l>2X3) (6) 

= 0 

which is its value in the true model since X 3 is absent from the true model. 

13A.3 The Proof of Equation (13.5.10) 


We have 

Y = a + PX* + Ui 

(1) 


Xi = X* + wt 

(2) 

Therefore, in deviation form \\ 

re obtain 



M = Px* + ( Ui - u) 

(3) 


Xi = X* + (w, - w) 

(4) 

Now when we use 

Y t = a + pX t + Ui 

(5) 


we obtain 


_ P I>* 2 + ft E**(w - w) + !>*(« -u) + £(« - u)(w - w) 

e** 2 +2 j:,* (w -w)+e( W -w)2 

Since we cannot take expectation of this expression because the expectation of the ratio of two 
variables is not equal to the ratio of their expectations {note: the expectations operator E is a linear 
operator), first we divide each term of the numerator and the denominator by n and take the proba¬ 
bility limit, plim (see Appendix A for details of plim), of 

2 = (l/») [P E ** 2 + ft H x*{w - w) + £ x*{u - u) + £(m - u){w - w)] 

(l/«) [I>* 2 +2J2x*(w -w) + - w) 2 ] 

Now the probability limit of the ratio of two variables is the ratio of their probability limits. Applying 
this rule and taking plim of each term, we obtain 


plim p = 


where crf-» and are variances of X* and w as sample size increases indefinitely and where we have 
used the fact that as the sample size increases indefinitely there is no correlation between the errors u 
and w as well as between them and the true X*. From the preceding expression, we finally obtain 


plim^ = p 



which is the required result. 
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13A.4 The Proof of Equation (13.6.2) 


Since there is no intercept in the model, the estimate of a, according to the formula for the regression 
through the origin, is as follows: 


a _ T,XjYj 

“ £*? 

Substituting for Y from the true model (13.2.8), we obtain 

Y.Xtjpxtui) Txf Ui 

j^x 2 P T, x f 

Statistical theory shows that if In Uj ~ N( 0, a 2 ) then 


( 1 ) 

( 2 ) 


Ui = log normal [e ff2/2 , e” 7 (e” 7 ^ j (3) 

Therefore, 



where use is made of the fact that the X’s are nonstochastic and each «, has an expected value of 

e ff2 / 2 . 

Since E (a) f /S, a is a biased estimator of fi. 






Topics in 
Econometrics 


Part 


3 


In Part 1 we introduced the classical linear regression model with all its assumptions. 
In Part 2 we examined in detail the consequences that ensue when one or more of the 
assumptions are not satisfied and what can be done about them. In Part 3 we study some 
selected but commonly encountered econometric techniques. In particular, we discuss 
these topics: (1) nonlinear-in-the-parameter regression models, (2) qualitative response 
regression models, (3) panel data regression models, and (4) dynamic econometric 
models. 

In Chapter 14, we consider models that are intrinsically nonlinear in the parameters. 
With the ready availability of software packages, it is no longer a big challenge to estimate 
such models. Although the underlying mathematics may elude some readers, the basic 
ideas of nonlinear-in-the-parameter regression models can be explained intuitively. With 
suitable examples, this chapter shows how such models are estimated and interpreted. 

In Chapter 15, we consider regression models in which the dependent variable is qualita¬ 
tive in nature. This chapter therefore complements Chapter 9, where we discussed models in 
which the explanatory variables were qualitative in nature. The basic thrust of this chapter is 
on developing models in which the regressand is of the yes or no type. Since ordinary least 
squares (OLS) poses several problems in estimating such models, several alternatives have 
been developed. In this chapter we consider two such alternatives, namely, the logit model 
and the probit model. This chapter also discusses several variants of the qualitative response 
models, such as the Tobit model and the Poisson regression model. Several extensions of 
the qualitative response models are also briefly discussed, such as the ordered probit, 
ordered logit, and multinomial logit. 

In Chapter 16 we discuss panel data regression models. Such models combine time 
series and cross-section observations. Although by combining such observations we increase 
the sample size, panel data regression models pose several estimation challenges. In this 
chapter we discuss only the essentials of such models and guide the reader to the appropriate 
resources for further study. 
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In Chapter 17, we consider regression models that include current as well as past, or 
lagged, values of the explanatory variables in addition to models that include the lagged 
value(s) of the dependent variable as one of the explanatory variables. These models are 
called, respectively, distributed lag and autoregressive models. Although such models are 
extremely useful in empirical econometrics, they pose some special estimating problems 
because they violate one or more assumptions of the classical regression model. We con¬ 
sider these special problems in the context of the Koyck, the adaptive-expectations (AE), 
and the partial-adjustment models. We also note the criticism leveled against the AE model 
by the advocates of the so-called rational expectations (RE) school. 


Chapter 


Nonlinear 
Regression Models 

The major emphasis of this book is on linear regression models, that is, models that are 
linear in the parameters and/or models that can be transformed so that they are linear in 
the parameters. On occasions, however, for theoretical or empirical reasons we have to 
consider models that are nonlinear in the parameters. 1 In this chapter we take a look at 
such models and study their special features. 

14.1 Intrinsically Linear and Intrinsically 
Nonlinear Regression Models 

When we started our discussion of linear regression models in Chapter 2, we stated that our 
concern in this book is basically with models that are linear in the parameters; they may or 
may not be linear in the variables. If you refer to Table 2.3, you will see that a model that is 
linear in the parameters as well as the variables is a linear regression model and so is a 
model that is linear in the parameters but nonlinear in the variables. On the other hand, if a 
model is nonlinear in the parameters it is a nonlinear (in-the-parameter) regression model 
whether the variables of such a model are linear or not. 

However, one has to be careful here, for some models look nonlinear in the parameters 
but are inherently or intrinsically linear because with suitable transformation they can be 
made linear-in-the-parameter regression models. But if such models cannot be linearized in 
the parameters, they are called intrinsically nonlinear regression models. From now on 
when we talk about a nonlinear regression model, we mean that it is intrinsically nonlinear. 
For brevity, we will call them NLRM. 

To drive home the distinction between the two, let us revisit Exercises 2.6 and 2.7. In Ex¬ 
ercise 2.6, Models a, b, c, and e are linear regression models because they are all linear in 
the parameters. Model d is a mixed bag, for fc is linear but not In fi\ • But if we let 
a — In f )\, then this model is linear in a and bi- 

In Exercise 2.7, Models d and e are intrinsically nonlinear because there is no simple way 
to linearize them. Model c is obviously a linear regression model. What about Models a 

'We noted in Chapter 4 that under the assumption of normally distributed error term, the OLS esti¬ 
mators are not only BLUE but are BUE (best unbiased estimator) in the entire class of estimators, 
linear or not. But if we drop the assumption of normality, as Davidson and MacKinnon note, it is 
possible to obtain nonlinear and/or biased estimators that may perform better than the OLS estima¬ 
tors. See Russell Davidson and James C. MacKinnon, Estimation and Inference in Econometrics, Oxford 
University Press, New York, 1993, p. 161. 
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and b? Taking the logarithms on both sides of a, we obtain In 7, — P\ + + Ui , which is 

linear in the parameters. Hence Model a is intrinsically a linear regression model. Model b 
is an example of the logistic (probability) distribution function, and we will study this in 
Chapter 15. On the surface, it seems that this is a nonlinear regression model. But a simple 
mathematical trick will render it a linear regression model, namely, 

ln(i^) =p 1 +p 2 X,+u i (14.1.1) 

Therefore, Model b is intrinsically linear. We will see the utility of models like Eq. (14.1.1) 
in the next chapter. 

Consider now the famous Cobb-Douglas (C-D) production function. Letting 
Y = output, X2 = labor input, and X 3 = capital input, we will write this function in three 
different ways: 

Y t = p x X%xl}e Ut (14.1.2) 


or, 

In Yi = a + ft In X 2i + ft In X 3i + n, (14.1 .2d) 

where a — In ft. Thus in this format the C-D function is intrinsically linear. 

Now consider this version of the C-D function: 

Yi - p,X%X%Ui (14.1.3) 


or, 


In Yi= a + ft lnX 2l + ft lnX 3 , + Inn, (14.1,3a) 

where a = In ft. This model too is linear in the parameters. 

But now consider the following version of the C-D function: 

Y i = p 1 X$x£+u i (14.1.4) 

As we just noted, C-D versions (14.1.2a) and (14.1.3a) are intrinsically linear (in the para¬ 
meter) regression models, but there is no way to transform Eq. (14.1.4) so that the trans¬ 
formed model can be made linear in the parameters. 2 Therefore, Eq. (14.1.4) is intrinsically 
a nonlinear regression model. 

Another well-known but intrinsically nonlinear function is the constant elasticity of 
substitution (CES) production function of which the Cobb-Douglas production is a spe¬ 
cial case. The CES production takes the following form: 

Yi = A[8K~ P + (1 - 8)L7 p ]~ l/p (14.1.5) 

where Y = output, K = capital input, L = labor input, A — scale parameter, 
8 = distribution parameter (0 < 8 < 1), and (i = substitution parameter (fi > -l). 3 No 
matter in what form you enter the stochastic error term a, in this production function, there 
is no way to make it a linear (in parameter) regression model. It is intrinsically a nonlinear 
regression model. 


2 lf you try to log-transform the model, it will not work because In (A + S) ^ In A + In 8. 

3 For properties of the CES production function, see Michael D. Intriligator, Ronald Bodkin, and Cheng 
Hsiao, Econometric Models, Techniques, and Applications, 2d ed., Prentice Hall, 1996, pp. 294-295. 
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14.2 Estimation of Linear and Nonlinear Regression Models 

To see the difference in estimating linear and nonlinear regression models, consider the fol¬ 
lowing two models: 

Y i = px+p 2 X i +u i (14.2.1) 

Y i = p l e kX < + u i (14.2.2) 

By now you know that Eq. (14.2.1) is a linear regression model, whereas Eq. (14.2.2) is a 
nonlinear regression model. Regression (14.2.2) is known as the exponential regression 
model and is often used to measure the growth of a variable, such as population, GDP, or 
money supply. 

Suppose we consider estimating the parameters of the two models by ordinary least 
squares (OLS). In OLS we minimize the residual sum of squares (RSS), which for model 
(14.2.1) is: 

= (14.2.3) 

where as usual P\ and f} 2 are the OLS estimators of the true P’s. Differentiating the preced¬ 
ing expression with respect to the two unknowns, we obtain the normal equations shown in 
Eqs. (3.1.4) and (3.1.5). Solving these equations simultaneously, we obtain the OLS estima¬ 
tors given in Eqs. (3.1.6) and (3.1.7). Observe very carefully that in these equations the 
unknowns (J3 ’s) are on the left-hand side and the knowns (X and Y) are on the right-hand 
side. As a result we get explicit solutions of the two unknowns in terms of our data. 

Now see what happens if we try to minimize the RSS of Eq. (14.2.2). As shown in 
Appendix 14A, Section 14A.1, the normal equations corresponding to Eqs. (3.1.4) and 
(3.1.5) are as follows: 


J2 Yie kx ‘ = Pie 2 ' hX< (14.2.4) 

J2 YiX ie ^ x < = p x X i f* iXi (14.2.5) 

Unlike the normal equations in the case of the linear regression model, the normal equa¬ 
tions for nonlinear regression have the unknowns (the P’s) both on the left- and right-hand 
sides of the equations. As a consequence, we cannot obtain explicit solutions of the un¬ 
knowns in terms of the known quantities. To put it differently, the unknowns are expressed 
in terms of themselves and the data! Therefore, although we can apply the method of least 
squares to estimate the parameters of the nonlinear regression models, we cannot obtain 
explicit solutions of the unknowns. Incidentally, OLS applied to a nonlinear regression 
model is called nonlinear least squares (NLLS). So, what is the solution? We take this 
question up next. 

14.3 Estimating Nonlinear Regression Models: 

The Trial-and-Error Method 


To set the stage, let us consider a concrete example. The data in Table 14.1 relates to the 
management fees that a leading mutual fund in the United States pays to its investment ad¬ 
visors to manage its assets. The fees paid depend on the net asset value of the fund. As you 
can see, the higher the net asset value of the fund, the lower are the advisory fees, which can 
be seen clearly from Figure 14.1. 
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TABLE 14.1 

Advisory Fees 
Charged and Asset 
Size 


Fee, % Asset* 

1 0.520 0.5 

2 0.508 5.0 

3 0.484 10 

4 0.46 15 

5 0.4398 20 

6 0.4238 25 

7 0.4115 30 

8 0.402 35 

9 0.3944 40 

10 0.388 45 

11 0.3825 55 

12 0.3738 60 




as of dollars. 


FIGURE 14.1 0.56 r 

Relationship of 
advisory fees to fund 

assets. 0.52 » 


10 20 30 40 50 60 70 

Asset, billions of dollars 


To see how the exponential regression model in Eq. (14.2.2) fits the data given in Table 
14.1, we can proceed by trial and error. Suppose we assume that initially = 0.45 and 
$2 — 0.01. These are pure guesses, sometimes based on prior experience or prior empirical 
work or obtained by just fitting a linear regression model even though it may not be appro¬ 
priate. At this stage do not worry about how these values are obtained. 

Since we know the values of fi\ and fii, we can write Eq. (14.2.2) as: 

Ui = Y t - p^ 2 * 1 =Yi- 0.45e ooljri (14.3.1) 

Therefore, 

52 k? = ~ 0.45e°' 01Xi ) 2 (14.3.2) 

Since Y,X,P\, and are known, we can easily find the error sum of squares in Eq. (14.3.2). 4 
Remember that in OLS our objective is to find those values of the unknown parameters that 
will make the error sum of squares as small as possible. This will happen if the estimated 


4 Note that we call £ u ? the error sum of squares and not the usual residual sum of squares because 
the values of the parameters are assumed to be known. 
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Y values from the model are as close as possible to the actual Y values. With the given 
values, we obtain w? = 0.3044. But how do we know that this is the least possible error 
sum of squares that we can obtain? What happens if you choose another value for ft and 
ft, say, 0.50 and -0.01, respectively? Repeating the procedure just laid down, we find that 
we now obtain = 0.0073. Obviously, this error sum of squares is much smaller than 
the one obtained before, namely, 0.3044. But how do we know that we have reached the 
lowest possible error sum of squares, if by choosing yet another set of values for the fi ’s, we 
will obtain yet another error sum of squares? 

As you can see, such a trial-and-error, or iterative, process can be easily implemented. 
And if one has infinite time and infinite patience, the trial-and-error process may ultimately 
produce values of ft and ft that may guarantee the lowest possible error sum of squares. But 
you might ask, how did we go from (ft = 0.45; ft = 0.01) to (ft = 0.50; ft = -0.01)? 
Clearly, we need some kind of algorithm that will tell us how we go from one set of values 
of the unknowns to another set before we stop. Fortunately such algorithms are available, 
and we discuss them in the next section. 

14.4 Approaches to Estimating Nonlinear Regression Models 

There are several approaches, or algorithms, to NLRMs: (1) direct search or trial and error, 
(2) direct optimization, and (3) iterative linearization. 5 

Direct Search or Trial-and-Error or Derivative-Free Method 

In the previous section we showed how this method works. Although intuitively appealing 
because it does not require the use of calculus methods as the other methods do, this 
method is generally not used. First, if an NLRM involves several parameters, the method 
becomes very cumbersome and computationally expensive. For example, if an NLRM in¬ 
volves 5 parameters and 25 alternative values for each parameter are considered, you will 
have to compute the error sum of squares (25) 5 = 9,765,625 times! Second, there is no 
guarantee that the final set of parameter values you have selected will necessarily give you 
the absolute minimum error sum of squares. In the language of calculus, you may obtain a 
local and not an absolute minimum. In fact, no method guarantees a global minimum. 

Direct Optimization 

In direct optimization we differentiate the error sum of squares with respect to each unknown 
coefficient, or parameter, set the resulting equation to zero, and solve the resulting normal 
equations simultaneously. We have already seen this in Eqs. (14.2.4) and (14.2.5). But as you 
can see from these equations, they cannot be solved explicitly or analytically. Some iterative 
routine is therefore called for. One routine is called the method of steepest descent. We will 
not discuss the technical details of this method as they are somewhat involved, but the reader 
can find the details in the references. Like the method of trial and error, the method of steepest 
descent also involves selecting initial trial values of the unknown parameters but then it pro¬ 
ceeds more systematically than the hit-or-miss or trial-and-error method. One disadvantage of 
this method is that it may converge to the final values of the parameters extremely slowly. 

5 The following discussion leans heavily on these sources: Robert S. Pindyck and Daniel L. Rubinfeld, 
Econometric Models and Economic Forecasts, 4th ed., McGraw-Hill, 1998, Chapter 10; Norman R. 
Draper and Harry Smith, Applied Regression Analysis, 3d ed., John Wiley St Sons, 1998, Chapter 24; 
Arthur S. Goldberger, A Course in Econometrics, Harvard University Press, 1991, Chapter 29; Russell 
Davidson and James MacKinnon, op. cit., pp. 201-207; John Fox, Applied Regression Analysis, Linear 
Models, and Related Methods, Sage Publications, 1997, pp. 393-400; and Ronald Gallant, Nonlinear 
Statistical Models, John Wiley and Sons, 1987. 
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Iterative Linearization Method 

In this method we linearize a nonlinear equation around some initial values of the parame¬ 
ters. The linearized equation is then estimated by OLS and the initially chosen values are 
adjusted. These adjusted values are used to relinearize the model, and again we estimate it 
by OLS and readjust the estimated values. This process is continued until there is no sub¬ 
stantial change in the estimated values from the last couple of iterations. The main tech¬ 
nique used in linearizing a nonlinear equation is the Taylor series expansion from 
calculus. Rudimentary details of this method are given in Appendix 14A, Section 14A.2. 
Estimating NLRM using Taylor series expansion is systematized in two algorithms, known 
as the Gauss-Newton iterative method and the Newton-Raphson iterative method. 
Since one or both of these methods are now incorporated in several computer packages, 
and since a discussion of their technical details will take us far beyond the scope of this 
book, there is no need to dwell on them here. 6 In the next section we discuss some exam¬ 
ples using these methods. 

14.5 Illustrative Examples 


EXAMPLE 14.1 

Mutual Fund 
Advisory Fees 


Refer to the data given in Table 14.1 and the NLRM (14.2.2). Using the EViews 6 nonlinear 
regression routine, which uses the linearization method, 7 we obtained the following 
regression results; the coefficients, their standard errors, and their t values are given in a 
tabular form: 


Variable 

Coefficient 

Std. Error 

t Value 

p Value 

Intercept 

0.5089 

0.0074 

68.2246 

0.0000 

Asset 

-0.0059 

0.00048 

-12.3150 

0.0000 


R z = 0.9385 d= 0.3493 


From these results, we can write the estimated model as: 

Fee, = 0.5089 Asset -00059 ( 14 . 5 . 1 ) 

Before we discuss these results, it may be noted that if you do not supply the initial values 
of the parameters to start the linearization process, EViews will do it on its own. It took 
EViews five iterations to obtain the results shown in Eq. (14.5.1). However, you can supply 
your own initial values to start the process. To demonstrate, we chose the initial value of 
Pi = 0.45 and = 0.01. We obtained the same results as in Eq. (14.5.1) but it took eight 
iterations. It is important to note that fewer iterations will be required if your initial values are 
not very far from the final values. In some cases you can choose the initial values of the 
parameters by simply running an OLS regression of the regressand on the regressor(s), 
simply ignoring the nonlinearities. For instance, using the data in Table 14.1, if you were 
to regress fee on assets, the OLS estimate of Pi is 0.5028 and that of is —0.002, which 


6 There is another method that is sometimes used, called the Marquard method, which is a com¬ 
promise between the method of steepest descent and the linearization (or Taylor series) method. The 
interested reader may consult the references for the details of this method. 

7 EViews provides three options: quadratic hill climbing, Newton-Raphson, and Berndt- 
Hall-Hall-Hausman. The default option is quadratic hill climbing, which is a variation of the 
Newton-Raphson method. 
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EXAMPLE 14.1 

0 Continued) 


are much closer to the final values given in Eq. (14.5.1). (For the technical details, see 
Appendix 14A, Section 14A.3.) 

Now about the properties of nonlinear least squares (NLLS) estimators. You may recall 
that, in the case of linear regression models with normally distributed error terms, we were 
able to develop exact inference procedures (i.e., test hypotheses) using the t, F, and x 2 
tests in small as well as large samples. Unfortunately, this is not the case with NLRMs, even 
with normally distributed error terms. The NLLS estimators are not normally distributed, are 
not unbiased, and do not have minimum variance in finite, or small, samples. As a result, we 
cannot use the t test (to test the significance of an individual coefficient) or the Ftest (to 
test the overall significance of the estimated regression) because we cannot obtain an 
unbiased estimate of the error variance a 2 from the estimated residuals. Furthermore, the 
residuals (the difference between the actual Y values and the estimated Y values from the 
NLRM) do not necessarily sum to zero, ESS and RSS do not necessarily add up to the TSS, 
and therefore R 2 = ESS/TSS may not be a meaningful descriptive statistic for such mod¬ 
els. However, we can compute R 2 as: 


R 2 = 1 


UY.-Y ) 2 


( 14 . 5 . 2 ) 


where Y = regressand and u, = Y; - Y,-, where Y, are the estimated Y values from the 
(fitted) NLRM. 

Consequently, inferences about the regression parameters in nonlinear regression are 
usually based on large-sample theory. This theory tells us that the least-squares and max¬ 
imum likelihood estimators for nonlinear regression models with normal error terms, 
when the sample size is large, are approximately normally distributed and almost unbi¬ 
ased, and have almost minimum variance. This large-sample theory also applies when the 
error terms are not normally distributed. 8 

In short, then, all inference procedures in NLRM are large sample, or asymptotic. 
Returning to Example 14.1, the t statistics given in Eq. (14.5.1) are meaningful only if 
interpreted in the large-sample context. In that sense, we can say that estimated coeffi¬ 
cients shown in Eq. (14.5.1) are individually statistically significant. Of course, our sample in 
the present instance is rather small. 

Returning to Eq. (14.5.1), how do we find out the rate of change of Y( = fee) with re¬ 
spect to X (asset size)? Using the basic rules of derivatives, the reader can see that the rate 
of change of Y with respect to X is: 

^ = fap 2 e hX = (—0.0059)(0.5089)e- aoo59X ( 14 . 5 . 3 ) 


As can be seen, the rate of change of fee depends on the value of the assets. For 
example, if X = 20 (million), the expected rate of change in the fees charged can be 
seen from Eq. (14.5.3) to be about —0.0031 percent. Of course, this answer will 
change depending on the X value used in the computation. Judged by the R 2 as com¬ 
puted from Eq. (14.5.2), the R 2 value of 0.9385 suggests that the chosen NLRM fits the 
data in Table 14.1 quite well. The estimated Durbin-Watson value of 0.3493 may sug¬ 
gest that there is autocorrelation or possibly model specification error. Although there 
are procedures to take care of these problems as well as the problem of heteroscedas- 
ticity in NLRM, we will not pursue these topics here. The interested reader may consult 
the references. 


8 John Neter, Michael H. Kutner, Christopher J. Nachtsheim, and William Wasserman, Applied Regres¬ 
sion Analysis, 3d ed., Irwin, 1996, pp. 548-549. 
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EXAMPLE 14.2 

The Cobb- 
Douglas 


Refer to the data given in Exercise 14.9 (Table 14.3). These data refer to the Mexican econ¬ 
omy for years 1955-1974. We will see if the NLRM given in Eq. (14.1.4) fits the data, not¬ 
ing that Y = output, X2 = labor input, and X3 = capital input. Using EViews 6, we obtained 
the following regression results, after 32 iterations. 


Function of 

Variable 

Coefficient 

Std. Error 

t Value 

p Value 

the Mexican 

Intercept 

0.5292 

0.2712 

1.9511 

0.0677 

Economy 

Labor 

0.1810 

0.1412 

1.2814 

0.2173 

Capital 

0.8827 

0.0708 

12.4658 

0.0000 


R 2 = 0.9942 d= 0.2899 


Therefore, the estimated Cobb-Douglas production function is: 

CDP f = 0.5292Labor° 181 °Capital° 8827 ( 14 . 5 . 4 ) 

Interpreted asymptotically, the equation shows that only the coefficient of the capital input is 
significant in this model. In Exercise 14.9 you are asked to compare these results with those 
obtained from the multiplicative Cobb-Douglas production function as given in Eq. (14.1.2). 


EXAMPLE 14.3 

Growth ofU.S. 

Population, 

1970-2007 


FIGURE 14.2 

Population versus 
Year. 


The Table in Exercise 14.8 gives data on total U.S. population for the period 1970-2007. 
A logistic model of the following type is often used to measure the growth of some 
populations, human beings, bacteria, etc.: 

Y 


1+e ^) +Wf < 14 - 5 - 5 > 
Where Y = population, in millions; t = time, measured chronologically; and the fi's are the 
parameters. 

This model is nonlinear in the parameters; there is no simple way to convert it into a model 
that is linear in the parameters. So we will need to use one of the nonlinear estimation meth¬ 
ods to estimate the parameters. Notice an interesting feature of this model: Although there 
are only two variables in the model, population and time, there are three unknown parame¬ 
ters, which shows that in a NLRM there can be more parameters than variables. 

An attempt to fit Eq. (14.5.5) to our data was not successful, as all the estimated coef¬ 
ficients were statistically insignificant. This is probably not surprising, for if we plot popu¬ 
lation against time, we obtain Figure 14.2. 

320,000 r 


300,000 - 
280,000 - 
| 260,000 - 
240,000 - 
220,000 


200,000 - 


Year 
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EXAMPLE 14.3 

(' Continued) 


FIGURE 14.3 

Logarithm of 
Population versus 
Year. 


This figure shows that there is an almost linear relationship between the two vari¬ 
ables. If we plot the logarithm of population against time, we obtain the following 
figure: 


12.65 p 
12.60 - 
12.55 - 
;• 12.50 - 
| 12.45 - 
\ 12.40 - 
i 12.35 - 
12.30 - 
12.25 


The slope of this figure (multiplied by 100) gives us the growth rate of population 
(why?). 

As a matter of fact, if we regress the log of population on time, we get the following 
results: 


Dependent Variable: LPOPULATOTr 
Method: Least Squares 
Sample: 1970-2007 


Included observations: 

38 




ficient 

Std. 

Error 

t-Statistic 

Prob. 

C 8.710413 

YEAR 0.010628 

0.147737 

7.43E-05 

-58.95892 

143.0568 

0.GOO# 
0.0000. 

E-squared 

#,.$98244 

Mean dependent var. 

12.424011. 

Adjusted E~ squared 

#,$98195 

S,&. 

dependent var. 

0.118211' 

S.E. of regression’*; 

0.005022 

Akaike info 

-7.698713 

Sum squared resid. 

0.000908 

Sctwars euitearion 

•7.612525 

Log likelihood 

148.2756 

Bannan-Quipja,, cri ter. 

-7.668048 

F- statistic 

20465.26 

Borbin-Watson s|||. 

0.366006 

Prob. (F-statistic) 

0.000000 





This table shows that, over the period 1970-2007, the U.S. population has been growing 
at the rate of about 1.06 percent per year. The R 2 value of 0.998 suggests that there is 
almost a perfect fit. 

This example brings out an important point that sometimes a linear (in the parameter) 
model might be preferable to a nonlinear (in the parameter) model. 










534 Part Three Topics in Econometrics 


EXAMPLE 14.4 

Box-Cox 
Transformation: 
US. Population 
1970-2007 


In Appendix 6A.5 we briefly considered the Box-Cox transformation. Let us continue with 
Example 14.3 but assume the following model: 

Population* = + p 2 Year + u 

As noted in Appendix 6A.5, depending on the value of X we have the following possibilities: 


Value of X Model 

— 1 —-j—-— = /8i + @2 Year + u 

Population 

0 In Population = f)i + p 2 Year + u 

1 Population, = ^ Year + u 


The first is an inverse model, the second is a semilog model (which we have already esti¬ 
mated in Example 14.3), and the third is a linear (in the variables) model. 

Which of these models is appropriate for the population data? The Box-Cox routine in 
STATA (Version 10) can be used to answer this question: 


Test 

HO: 

Restricted 

Log likelihood 

LR statistic 
chi 2 

p-value 

Prob > chi 2 

e = -i 

-444.42475 

0.14 

0.707 

9 = 0 

-444.38813 

0.07 

0.794 

9 = 1 

-444.75684 

0.81 

0.369 


Note: In our notation, theta (9) is the same thing as lamda (X). The table shows that on the 
basis of the likelihood ratio (LR) test, we cannot reject any of these X values as possible val¬ 
ues for power of population; that is, in the present example, linear, inverse and semilog 
models are equal candidates to depict the behavior of population over the sample period 
1970-2007. Therefore, we present the results of all three models: 


Dependent variable 

Intercept 

Slope 

R 2 

1/Population 

0.000089 

—4.28e-08 

0.9986 


t (166.14) 

(-1568.10) 


In Population 

-8.7104 

0.0106 

0.9982 


t (-58.96) 

(143.06) 


Population 

-5042627 

2661.825 

0.9928 


t (-66.92) 

(70.24) 



In all of these models the estimated coefficients are all highly statistically significant. But 
note that the R 2 values are not directly comparable because the dependent variables in the 
three models are different. 

This example shows how nonlinear estimation techniques can be applied in concrete 
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Summary and 
Conclusions 


EXERCISES 


The main points discussed in this chapter can be summarized as follows: 

1. Although linear regression models predominate theory and practice, there are occasions 
where nonlinear-in-the-parameter regression models (NLRM) are useful. 

2. The mathematics underlying linear regression models is comparatively simple in 
that one can obtain explicit, or analytical, solutions of the coefficients of such mod¬ 
els. The small-sample and large-sample theory of inference of such models is well 
established. 

3. In contrast, for intrinsically nonlinear regression models (NLRM), parameter values 
cannot be obtained explicitly. They have to be estimated numerically, that is, by iterative 
procedures. 

4. There are several methods of obtaining estimates of NLRMs, such as (1) trial and 
error, (2) nonlinear least squares (NLLS), and (3) linearization through Taylor series 
expansion. 

5. Computer packages now have built-in routines, such as Gauss-Newton, Newton- 
Raphson, and Marquard. These are all iterative routines. 

6. NLLS estimators do not possess optimal properties in finite samples, but in large sam¬ 
ples they do have such properties. Therefore, the results of NLLS in small samples must 
he interpreted carefully. 

7. Autocorrelation, heteroscedasticity, and model specification problems can plague 
NLRM, as they do linear regression models. 

8. We illustrated the NLLS with several examples. With the ready availability of user- 
friendly software packages, estimation of NLRM should no longer be a mystery. There¬ 
fore, the reader should not shy away from such models whenever theoretical or practical 
reasons dictate their use. As a matter of fact, if you refer to Exercise 12.10, you will 
see from Eq. (1) that it is intrinsically a nonlinear regression model that should be 
estimated as such. 


Questions 

14.1. What is meant by intrinsically linear and intrinsically nonlinear regression models? 
Give some examples. 

14.2. Since the error term in the Cobb-Douglas production function can be entered multi- 
plicatively or additively, how would you decide between the two? 

14.3. What is the difference between OLS and nonlinear least-squares (NLLS) 
estimation? 

14.4. The relationship between pressure and temperature in saturated steam can be 
expressed as:* 

Y = ft(10)&«rt* + u, 

where Y — pressure and t — temperature. Using the method of nonlinear least 
squares (NLLS), obtain the normal equations for this model. 


‘Adapted from Draper and Smith, op. cit., p. 554. 
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14.5. State whether the following statements are true or false. Give your reasoning. 

a. Statistical inference in NLLS regression cannot be made on the basis of the usual 
t, F, and j 2 tests even if the error term is assumed to be normally distributed. 

b. The coefficient of determination (R 2 ) is not a particularly meaningful number for 
an NLRM. 

14.6. How would you linearize the CES production function discussed in the chapter? 
Show the necessary steps. 

14.7. Models that describe the behavior of a variable over time are called growth models. 
Such models are used in a variety of fields, such as economics, biology, botany, ecol¬ 
ogy, and demography. Growth models can take a variety of forms, both linear and non¬ 
linear. Consider the following models, where Y is the variable whose growth we want 
to measure; t is time, measured chronologically; and u t is the stochastic error term. 

a. Y t = f} x +p 2 t + u t 

b. In 7, +frt + u, 

c. Logistic growth model: Y t — t + + u, 

d. Gompertz growth model: Y, = p x e~^ ie h ‘ + u t 

Find out the properties of these models by considering the growth of Y in relation 
to time. 

Empirical Exercises 

14.8. The data in Table 14.2 gives U.S. population, in millions of persons, for the period 
1970-2007. Fit the growth models given in Exercise 14.7 and decide which model 
gives a better fit. Interpret the parameters of the model. 

14.9. Table 14.3 gives data on real GDP, labor, and capital for Mexico for the period 
1955-1974. See if the multiplicative Cobb-Douglas production function given in 
Eq. (14.1.2a) fits these data. Compare your results with those obtained from fitting 
the additive Cohb-Douglas production function given in Eq. (14.1.4), whose results 
are given in Example 14.2. Which is a better fit? 


TABLE 14.2 

U.S. Population 

Year 

Population 

Year 

Population 

(Millions) 

1970 

205,052 

1989 

247,342 


1971 

207,661 

1990 

250,1 32 

We: Economic Report of the 

1972 

209,896 

1991 

253,493 

President, 2008. 

1973 

211,909 

1992 

256,894 


1974 

21 3,854 

1993 

260,255 


1975 

215,973 

1994 

263,436 


1976 

218,035 

1995 

266,557 


1977 

220,239 

1996 

269,667 


1978 

222,585 

1997 

272,912 


1979 

225,055 

1998 

276,115 


1980 

227,726 

1999 

279,295 


1981 

229,966 

2000 

282,407 


1982 

232,188 

2001 

285,339 


1983 

234,307 

2002 

288,189 


1984 

236,348 

2003 

290,941 


1985 

238,466 

2004 

293,609 


1986 

240,651 

2005 

299,801 


1987 

242,804 

2006 

299,157 


1988 

245.021 

2007 

302.405 
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TABLE 14.3 Production Function Data for the Mexican Economy 


Observation 

GDP 

Labor 

Capital 

Observation 

GDP 

Labor 

Capital 

1955 

114,043 

8,310 

182,113 

1965 

212,323 

11,746 

315,715 

1956 

120,410 

8,529 

193,749 

1966 

226,977 

11,521 

337,642 

1957 

129,187 

8,738 

205,192 

1967 

241,194 

11,540 

363,599 

1958 

134,705 

8,952 

215,130 

1968 

260,881 

12,066 

391,847 

1959 

1 39,960 

9,171 

225,021 

1969 

277,498 

12,297 

422,382 

1960 

150,511 

9,569 

237,026 

1970 

296,530 

12,955 

455,049 

1961 

157,897 

9,527 

248,897 

1971 

306,712 

13,338 

484,677 

1962 

165,286 

9,662 

260,661 

1972 

329,030 

13,738 

520,553 

1963 

1 78,491 

10,334 

275,466 

1973 

354,057 

15,924 

561,531 

1964 

199,457 

10,981 

295,378 

1974 

374,977 

14,154 

609,825 


Appendix 14A 


14A.1 Derivation of Equations (14.2.4) and (14.2.5) 


Write Eq. (14.2.2) as 

u i = Y 1 -fhe^ x ‘ 

(1) 

Therefore, 


(2) 

The error sum of squares is thus a function of ft and ft, since the values of Y and A' are known. There¬ 
fore, to minimize the error sum of squares, we have to partially differentiate it with respect to the two 
unknowns, which gives: 



(3) 



(4) 


By the first-order condition of optimization, setting the preceding equations to zero and solving them 
simultaneously, we obtain Eqs. (14.2.4) and (14.2.5). Note that in differentiating the error sum of 
squares we have used the chain rule. 


14A.2 The Linearization Method 


Students familiar with calculus will recall Taylor’s theorem, which states that any arbitrary function 
f(X) that is continuous and has a continuous «th-order derivative can be approximated around point 
X = X 0 bya polynomial function and a remainder as follows: 


/w = /g) + /w-xo? + + 


f"(x 0 )(x-x 0 y 


( 1 ) 
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where /'(Xo) is the first derivative of /(X) evaluated at X = X 0 , /"(Xo) is the second derivative of 
/(X) evaluated at X = X 0 and so on, where n\ (read n factorial) stands for n(n — l)(n — 2)... 1 with 
the convention that 0! = 1 , and R stands for the remainder. If we take n = 1 , we get a linear 
approximation; choosing n = 2, we get a second-degree polynomial approximation. As you can 
expect, the higher the order of the polynomial, the better the approximation to the original function. 
The series given in Eq. (1) is called Taylor’s series expansion of/(X) around the point X = X n . As 
an example, consider the function: 

Y = /(X) = «i + a 2 X + a 3 X 2 + a 4 X 3 
Suppose we want to approximate it at X = 0. We now obtain: 

/(0) = <*i /'(0) = <* 2 /"(0) = 2a 3 /"'(0) = 6a 4 

Hence we can obtain the following approximations: 

First order: Y = on + = cei + a 2 X + remainder (= c^X 2 + a 4 X 3 ) 

Second order: Y = /(0) + ^ypX + ^^ X 2 

== ai + CC2X + CX3X 2 + remainder ( = « 4 A' 3 ) 

Third order: Y = a\ + a 2 X + 0:3 X 2 + a 4 X 3 

The third-order approximation reproduces the original equation exactly. 

The objective of Taylor series approximation is usually to choose a lower-order polynomial in the 
hope that the remainder term will be inconsequential. It is often used to approximate a nonlinear 
function by a linear function, by dropping the higher-order terms. 

The Taylor series approximation can be easily extended to a function containing more than one X. 
For example, consider the following function: 

Y = /(X, Z) ( 2 ) 

and suppose we want to expand it around X= a and Z= b. Taylor’s theorem shows that 
f(x,z) = f(a,b)+f x (a,b)(x-a) 

+ Ma, b)f(z ~b)+^ [f xx (a, bfix - a) 2 ( 3 ) 

- 2 f„(a, b)(x - a)(z -b) + f zz (a, b)(z - A) 2 ] + — 

where f x = partial derivative of the function with respect to (w.r.t.) X, f xx = second partial derivative 
of the function w.r.t. X and similarly for the variable Z. If we want a linear approximation to the function, 
we will use the first two terms in Eq. (3), if we want a quadratic, or second-degree, approximation, we 
will use the first three terms in Eq. (3), and so on. 

14A.3 Linear Approximation of the Exponential 
Function Given in Equation (14.2.2) 

The function under consideration is: 

Y = f(fh, fo) = fhe^ ( 1 ) 

Note: For ease of manipulation, we have dropped the observation subscript. 

Remember that in this function the unknowns are the ft coefficients. Let us linearize this function 
at fi\ = p* and p 2 = where the starred quantities are given fixed values. To linearize this, we pro¬ 
ceed as follows: 


Y = /(/Si, h) = MPt< ft) + /ft (fit. ft)(P' ~ ft) + /ft (fit. ft)(ft - ft) (2) 
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where fa and fa are the partial derivatives of the function (1) with respect to the unknowns and these 
derivatives will be evaluated at the (assumed) starred values of the unknown parameters. Note that we 
are using only the first derivatives in the preceding expression, since we are linearizing the function. 
Now assume that p* = 0.45 and p\ = 0.01, which are pure guess-estimates of the true coefficients. 


f(P\ = 0.45, P£ = 0.01) = 0.45e°' 01Xi 
fa = efcJr ' and fpl = Pl^i^ X ‘ 

by the standard rules of differentiation. Evaluating these derivatives at the given values and reverting 
to Eq. (2), we obtain: 

Yt = 0A5e omx ‘ + e omx ‘ (ft - 0.45) + (0.45)Xie O OIX *(ft - 0.01) (4) 

which we write as: 

(Y, - 0.45e OOIXi ) = e 0MX, ai + 0A5X i e omXi a 2 (5) 

at = (ft - 0.45) and a 2 = (ft - 0.01) (6) 

Now let = (k; - 0.45e° 01X( ), X\ = e 001Xi , and X 2i = 0.45X,e°° 1Xi . Using these definitions and 
adding the error term we can finally write Eq. (5) as: 

Y? = aiXu+a 2 X2i + Ui ( 7 ) 

Lo and behold, we now have a linear regression model. Since Y*, Xu, and X 2i can be readily com¬ 
puted from the data, we can easily estimate Eq. (7) by OLS and obtain the values of ct\ and (22. Then, 
from Eq. (6), we obtain: 


Pi=&i + 0.45 and ft = a 2 + 0.01 (8) 

Call these values p** and ft 2 *, respectively. Using these (revised) values, we can start the iterative 
process given in Eq. (2), obtaining yet another set of values of the p coefficients. We can go on iter¬ 
ating (or linearizing) in this fashion until there is no substantial change in the values of the p coef¬ 
ficients. In Example 14.1, it took five iterations, but for the Mexican Cobb-Douglas example 
(Example 14.2), it took 32 iterations. But the underlying logic behind these iterations is the proce¬ 
dure just illustrated. 

For the mutual fund fee structure example in Section 14.3, the Y*, X\, and X 2 as given in Eq. (6) 
are as shown in Table 14.4; the basic data are given in Table 14.1. From these values, the regression 
results corresponding to Eq. (7) are: 

Dependent variable: Y* 

Method: Least squares 


Variable Coefficient Std. Error t-Statistic Prob. 
Xi 0.022739 0.014126 1.609705 0.1385 

X 2 0.01.0693 0.00079a & -13.52990 0.00»® 


R 2 = 0.968324 Durbin-Watson d statistic = 0.308883 


Now using Eq. (8), the reader can verify that 


P* = 0.4727 and Pi = -0.00069 


( 9 ) 
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TABLE 14.4 


Y* 

0.067744 

0.034928 

-0.013327 

-0.062825 

-0.109831 

-0.154011 

-0.195936 

-0.236580 

-0.276921 

-0.317740 

-0.397464 

-0.446153 


1.005013 
1.051271 
1.105171 
1.161834 
1.221403 
1.284025 
1.349859 
1.419068 
1.491825 
1.568312 
1.733253 
1.822119 


*2 

0.226128 
2.365360 
4.973269 
7.842381 
10.99262 
14.44529 
18.22309 
22.35031 
26.85284 
31.75832 
42.89801 
49.19721 


Contrast these numbers with the initial guesses of 0.45 and 0.01, respectively, for the two para¬ 
meters. Using the new estimates given in Eq. (9), you can start the iterative procedure once more and 
go on iterating until there is “convergence” in the sense that the final round of the estimates does not 
differ much from the round before that. Of course, you will require fewer iterations if your initial 
guess is closer to the final values. Also, notice that we have used only the linear term in Taylor’s series 
expansion. If you were to use the quadratic or higher-order terms in the expansion, perhaps you would 
reach the final values much quicker. But in many applications the linear approximation has proved to 
be quite good. 





Chapter 


Qualitative Response 
Regression Models 

In all the regression models that we have considered so far, we have implicitly assumed that 
the regressand, the dependent variable, or the response variable Y is quantitative, whereas 
the explanatory variables are either quantitative, qualitative (or dummy), or a mixture 
thereof. In fact, in Chapter 9, on dummy variables, we saw how the dummy regressors are 
introduced in a regression model and what role they play in specific situations. 

In this chapter we consider several models in which the regressand itself is qualitative in 
nature. Although increasingly used in various areas of social sciences and medical research, 
qualitative response regression models pose interesting estimation and interpretation chal¬ 
lenges. In this chapter we only touch on some of the major themes in this area, leaving the 
details to more specialized books. 1 

15.1 The Nature of Qualitative Response Models 


Suppose we want to study the labor force participation (LFP) decision of adult males. Since 
an adult is either in the labor force or not, LFP is a yes or no decision. Hence, the response 
variable, or regressand, can take only two values, say, 1 if the person is in the labor 
force and 0 if he or she is not. In other words, the regressand is a binary, or dichotomous, 
variable. Labor economics research suggests that the LFP decision is a function of the 
unemployment rate, average wage rate, education, family income, etc. 

As another example, consider U.S. presidential elections. Assume that there are two 
political parties, Democratic and Republican. The dependent variable here is vote choice 
between the two political parties. Suppose we let Y = 1, if the vote is for a Democratic 
candidate, and Y — 0, if the vote is for a Republican candidate. A considerable amount of 
research on this topic has been done by the economist Ray Fair of Yale University and sev¬ 
eral political scientists. 2 Some of the variables used in the vote choice are growth rate of 
GDP, unemployment and inflation rates, whether the candidate is running for reelection, etc. 

'At the introductory level, the reader may find the following sources very useful. Daniel A. Powers 
and Yu Xie, Statistical Methods for Categorical Data Analysis, Academic Press, 2000; John H. Aldrich 
and Forrest Nelson, Linear Probability, Logit, and Probit Models, Sage Publications, 1984; and Tim 
Futing Liao, Interpreting Probability Models: Logit, Probit and Other Generalized Linear Models, Sage 
Publications, 1994. For a very comprehensive review of the literature, see C. S. Maddala, Limited- 
Dependent and Qualitative Variables in Econometrics, Cambridge University Press, 1983. 

2 See, for example, Ray Fair, "Econometrics and Presidential Elections," journal of Economic Perspective, 
Summer 1996, pp. 89-102, and Michael S. Lewis-Beck, Economics and Elections: The Major Western 
Democracies, University of Michigan Press, Ann Arbor, 1980. 
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For the present purposes, the important thing to note is that the regressand is a qualitative 
variable. 

One can think of several other examples where the regressand is qualitative in nature. Thus, 
a family either owns a house or it does not, it has disability insurance or it does not, both hus¬ 
band and wife are in the labor force or only one spouse is. Similarly, a certain drug is effective 
in curing an illness or it is not. A firm decides to declare a stock dividend or not, a senator 
decides to vote for a tax cut or not, a U.S. president decides to veto a bill or accept it, etc. 

We do not have to restrict our response variable to yes/no or dichotomous categories 
only. Returning to our presidential elections example, suppose there are three parties, 
Democratic, Republican, and Independent. The response variable here is trichotomous. In 
general, we can have a polychotomous (or multiple-category) response variable. 

What we plan to do is to first consider the dichotomous regressand and then consider 
various extensions of the basic model. But before we do that, it is important to note a fun¬ 
damental difference between a regression model where the regressand Y is quantitative and 
a model where it is qualitative. 

In a model where Y is quantitative, our objective is to estimate its expected, or mean, 
value given the values of the regressors. In terms of Chapter 2, what we want is 
E(Y l | X u , X 2 i, ..., Xu), where theX’s are regressors, both quantitative and qualitative. In 
models where Y is qualitative, our objective is to find the probability of something happen¬ 
ing, such as voting for a Democratic candidate, or owning a house, or belonging to a union, 
or participating in a sport, etc. Hence, qualitative response regression models are often 
known as probability models. 

In the rest of this chapter, we seek answers to the following questions: 

1. How do we estimate qualitative response regression models? Can we simply estimate 
them with the usual OLS procedures? 

2. Are there special inference problems? In other words, is the hypothesis testing proce¬ 
dure any different from the ones we have learned so far? 

3. If a regressand is qualitative, how can we measure the goodness of fit of such models? 
Is the conventionally computed R 2 of any value in such models? 

4. Once we go beyond the dichotomous regressand case, how do we estimate and interpret 
the polychotomous regression models? Also, how do we handle models in which the re¬ 
gressand is ordinal, that is, an ordered categorical variable, such as schooling (less than 
8 years, 8 to 11 years, 12 years, and 13 or more years), or the regressand is nominal where 
there is no inherent ordering, such as ethnicity (Black, White, Hispanic, Asian, and other)? 

5. How do we model phenomena such as the number of visits to one’s physician per year, 
the number of patents received by a firm in a given year, the number of articles published 
by a college professor in a year, the number of telephone calls received in a span of 
5 minutes, or the number of cars passing through a toll booth in a span of 5 minutes? 
Such phenomena, called count data, or rare event data, are an example of the Poisson 
(probability) process. 

In this chapter we provide answers to some of these questions at the elementary level, 
for some of the topics are quite advanced and require more background in mathematics and 
statistics than assumed in this book. References cited in the various footnotes may be 
consulted for further details. 

We start our study of qualitative response models by first considering the binary 
response regression model. There are four approaches to developing a probability model 
for a binary response variable: 

1. The linear probability model (LPM) 


Chapter 15 Qualitative Response Regression Models 543 


2. The logit model 

3. The probit model 

4. The tobit model 

Because of its comparative simplicity, and because it can be estimated by ordinary least 
squares (OLS), we will first consider the LPM, leaving the other two models for subsequent 
sections. 

15.2 The Linear Probability Model (LPM) 

To fix ideas, consider the following regression model: 

% = A + hx t + Ui (15.2.1) 

where X — family income and Y — 1 if the family owns a house and 0 if it does not own a 
house. 

Model (15.2.1) looks like a typical linear regression model but because the regressand 
is binary, or dichotomous, it is called a linear probability model (LPM). This is because 
the conditional expectation of Y t given X ,, E(Y t \X t ), can be interpreted as the conditional 
probability that the event will occur given X t , that is, Pr (7, = 1 Xf). Thus, in our exam¬ 
ple, E ( Y l X t ) gives the probability of a family owning a house and whose income is the 
given amount X t . 

The justification of the name LPM for models like Eq. (15.2.1) can be seen as follows: 
Assuming E(ui) = 0, as usual (to obtain unbiased estimators), we obtain 

E(Y t \X i ) = p 1 +p 2 X i (15.2.2) 

Now, if P, — probability that T, = 1 (that is, the event occurs), and (1 — P t ) = probability 
that y, = 0 (that is, the event does not occur), the variable Y t has the following (probabil¬ 
ity) distribution: 


Y; Probability 

0 1 - Pi 

1 Pi 

Total 1 


That is, Ij follows the Bernoulli probability distribution. 

Now, by the definition of mathematical expectation, we obtain: 

E(Yi) = 0(1 - Pi) + 1 (Pi) = Pi (15.2.3) 

Comparing Eq. (15.2.2) with Eq. (15.2.3), we can equate 

E(Jt | Xt) = A + fox, = Pi (15.2.4) 

that is, the conditional expectation of the model (15.2.1) can, in fact, be interpreted as the 
conditional probability of Y t . In general, the expectation of a Bernoulli random variable 
is the probability that the random variable equals 1. In passing note that if there are n 
independent trials, each with a probability p of success and probability (1 — p ) of failure, 
and X of these trials represent the number of successes, then X is said to follow the 
binomial distribution. The mean of the binomial distribution is np and its variance is 
np(l — p). The term success is defined in the context of the problem. 
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Since the probability P, must lie between 0 and 1, we have the restriction 

0 < E(Yi | Xi) < 1 (15.2.5) 

that is, the conditional expectation (or conditional probability) must lie between 0 and 1. 

From the preceding discussion it would seem that OLS can be easily extended to binary 
dependent variable regression models. So, perhaps there is nothing new here. Unfortu¬ 
nately, this is not the case, for the LPM poses several problems, which are as follows: 

Non-Normality of the Disturbances u, 

Although OLS does not require the disturbances («,•) to be normally distributed, we 
assumed them to be so distributed for the purpose of statistical inference. 3 But the 
assumption of normality for u, is not tenable for the LPMs because, like Y t , the distur¬ 
bances m, also take only two values; that is, they also follow the Bernoulli distribution. This 
can be seen clearly if we write Eq. (15.2.1) as 

Ui = Y i -p ] - p 2 X t (15.2.6) 

The probability distribution of m, is 


Ui Probability 

When Yj = 1 1 - ft — /S 2 X, P, 

When Yj =0 -ft - ft.X, (1 - P,) 


(15.2.7) 


Obviously, w, cannot be assumed to be normally distributed; they follow the Bernoulli 
distribution. 

But the nonfulfillment of the normality assumption may not be so critical as it appears 
because we know that the OLS point estimates still remain unbiased (recall that, if the 
objective is point estimation, the normality assumption is not necessary). Besides, as the 
sample size increases indefinitely, statistical theory shows that the OLS estimators tend to 
be normally distributed generally. 4 As a result, in large samples the statistical inference of 
the LPM will follow the usual OLS procedure under the normality assumption. 

Heteroscedastic Variances of the Disturbances 

Even if E(ut) = 0 and cov(w, , u/) = 0 for i ^ j (i.e., no serial correlation), it can no 
longer be maintained that in the LPM the disturbances are homoscedastic. This is, however, 
not surprising. As statistical theory shows, for a Bernoulli distribution the theoretical mean 
and variance are, respectively, p and p( 1 - p), where p is the probability of success 
(i.e., something happening), showing that the variance is a function of the mean. Hence the 
error variance is heteroscedastic. 

For the distribution of the error term given in Eq. (15.2.7), applying the definition of 
variance, the reader should verify that (see Exercise 15.10) 

var(Mi) = />(! - Pi) (15.2.8) 


3 Recall that we have recommended that the normality assumption be checked in an application by 
suitable normality tests, such as the Jarque-Bera test. 

4 The proof is based on the central limit theorem and may be found in E. Malinvaud, Statistical 
Methods of Econometrics, Rand McNally, Chicago, 1966, pp. 195-197. If the regressors are deemed 
stochastic and are jointly normally distributed, the F and t tests can still be used even though the 
disturbances are non-normal. Also keep in mind that as the sample size increases indefinitely, the 
binomial distribution converges to the normal distribution. 
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That is, the variance of the error term in the LPM is heteroscedastic. Since 
Pi = E( Yi | Xi ) = Pi + p 2 Xi, the variance of u, ultimately depends on the values of X and 
hence is not homoscedastic. 

We already know that, in the presence of heteroscedasticity, the OLS estimators, 
although unbiased, are not efficient; that is, they do not have minimum variance. But the 
problem of heteroscedasticity, like the problem of non-normality, is not insurmountable. In 
Chapter 11 we discussed several methods of handling the heteroscedasticity problem. 
Since the variance of u, depends on E(Y l \ X t ), one way to resolve the heteroscedasticity 
problem is to transform the model (15.2.1) by dividing it through by 

y/EWXMl-EWXt)] = vm - Pi) = say ^ 


that is, 




(15.2.9) 


As you can readily verify, the transformed error term in Eq. (15.2.9) is homoscedastic. 
Therefore, after estimating Eq. (15.2.1), we can now estimate Eq. (15.2.9) by OLS, which 
is nothing but the weighted least squares (WLS) with w, serving as the weights. 

In theory, what we have just described is fine. But in practice the true E(Y, \ X t ) is 
unknown; hence the weights w,- are unknown. To estimate w t , we can use the following two- 
step procedure: 5 

Step 1. Run the OLS regression (15.2.1) despite the heteroscedasticity problem and 
obtain % — estimate of the true E(Y l | A,). Then obtain iv, = 7,(1 — %), the estimate 
of w t . 

Step 2. Use the estimated w, to transform the data as shown in Eq. (15.2.9) and esti¬ 
mate the transformed equation by OLS (i.e., weighted least squares). 


Although we will illustrate this procedure for our example shortly, it may be noted that we 
can use White’s heteroscedasticity-corrected standard errors to deal with heteroscedastic¬ 
ity, provided the sample is reasonably large. 

Even if we correct for heteroscedasticity, we first need to address another problem that 
plagues LPM. 


Nonfulfillment of 0 < £(K; | X/) < 1 

Since E( Y l X, ) in the linear probability models measures the conditional probability of the 
event Y occurring given A, it must necessarily lie between 0 and 1. Although this is true a 
priori, there is no guarantee that %, the estimators of E(Y t X,), will necessarily fulfill this 
restriction, and this is the real problem with the OLS estimation of the LPM. This happens 
because OLS does not take into account the restriction that 0 < E{Y t ) <1 (an inequality 
restriction). There are two ways of finding out whether the estimated % lie between 0 and 1. 
One is to estimate the LPM by the usual OLS method and find out whether the estimated % 
lie between 0 and 1. If some are less than 0 (that is, negative), % is assumed to be zero for 
those cases; if they are greater than 1, they are assumed to be 1. The second procedure is to 
devise an estimating technique that will guarantee that the estimated conditional probabili¬ 
ties % will lie between 0 and 1. The logit and probit models discussed later will guarantee 
that the estimated probabilities will indeed lie between the logical limits 0 and 1. 


5 For the justification of this procedure, see Arthur S. Goldberger, Econometric Theory, John Wiley & 
Sons, New York, 1964, pp. 249-250. The justification is basically a large-sample one that we 
discussed under the topic of feasible or estimated generalized least squares in the chapter on 
heteroscedasticity (see Sec. 11.6). 




546 Part Three Topics in Econometrics 


Questionable Value of R 2 as a Measure of Goodness of Fit 

The conventionally computed R 2 is of limited value in the dichotomous response mod¬ 
els. To see why, consider Figure 15.1. Corresponding to a given X, Y is either 0 or 1. 
Therefore, all the Y values will either lie along the X axis or along the line correspond¬ 
ing to 1. Therefore, generally no LPM is expected to fit such a scatter well, whether it is 
the unconstrained LPM (Figure 15.1a) or the truncated or constrained LPM (Fig¬ 
ure 15.16), an LPM estimated in such a way that it will not fall outside the logical band 
0-1. As a result, the conventionally computed R 2 is likely to be much lower than 1 for 
such models. In most practical applications the R 2 ranges between 0.2 to 0.6. R 2 in such 
models will be high, say, in excess of 0.8 only when the actual scatter is very closely 
clustered around points A and B (Figure 15.1c), for in that case it is easy to fix the 
straight line by joining the two points A and B. In this case the predicted Y l will be very 
close to either 0 or 1. 
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EXAMPLE 15.1 

LPM: 

A Numerical 
Example 


TABLE 15.1 
Hypothetical 
Data on Home 
Ownership (Y = 1 
If owns home, 

0 Otherwise) 
and Income X 
(Thousands of 
dollars) 


For these reasons John Aldrich and Forrest Nelson contend that “use of the coefficient 
of determination as a summary statistic should be avoided in models with qualitative 
dependent variable[s].” 6 

To illustrate some of the points made about the LPM in this section, we present a numer¬ 
ical example. Table 15.1 gives invented data on home ownership 7(1 = owns a house, 

0 = does not own a house) and family income X (thousands of dollars) for 40 families. 
From these data the LPM estimated by OLS was as follows: 

Yi = -0.9457 + 0.1021X; 

(0.1228) (0.0082) ( 15 . 2 . 10 ) 

t = (-7.6984) (12.515) R 2 = 0.8048 

First, let us interpret this regression. The intercept of —0.9457 gives the "probability" that 
a family with zero income will own a house. Since this value is negative, and since proba¬ 
bility cannot be negative, we treat this value as zero, which is sensible in the present in¬ 
stance. 7 The slope value of 0.1021 means that for a unit change in income (here $1,000), 
on the average the probability of owning a house increases by 0.1021 or about 10 percent. 
Of course, given a particular level of income, we can estimate the actual probability of 
owning a house from Eq. (15.2.10). Thus, for X = 12 ($12,000), the estimated probabil¬ 
ity of owning a house is 

(Yi | X = 12) = -0.9457 + 12(0.1021) 

= 0.2795 


Family 

Y 

X 

Family 

Y 

X 

1 

0 

8 

21 

1 

22 

2 

1 

16 

22 

1 

16 

3 

1 

18 

23 

0 

12 

4 

0 

11 

24 

0 

11 

5 

0 

12 

25 

1 

16 

6 

1 

19 

26 

0 

11 

7 

1 

20 

27 

1 

20 

8 

0 

13 

28 

1 

18 

9 

0 

9 

29 

0 

11 

10 

0 

10 

30 

0 

10 

11 

1 

17 

31 

1 

17 

12 

1 

18 

32 

0 

13 

13 

0 

14 

33 

1 

21 

14 

1 

20 

34 

1 

20 

15 

0 

6 

35 

0 

11 

16 

1 

19 

36 

0 

8 

17 

1 

16 

37 

1 

17 

18 

0 

10 

38 

1 

16 

19 

0 

8 

39 

0 

7 

20 

1 

18 

40 

1 

17 


( Continued) 


6 Aldrich and Nelson, op. cit., p. 15. For other measures of goodness of fit in models involving 
dummy regressands, seeT. Amemiya, "Qualitative Response Models," journal of Economic Literature, 
vol. 19, 1981, pp. 331-354. 

7 One can loosely interpret the highly negative value as near 
income is zero. 


improbability of owning a house when 
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EXAMPLE 15.1 

( Continued) 


That is, the probability that a family with an income of $12,000 will own a house is about 
28 percent. Table 15.2 shows the estimated probabilities, Y-,, for the various income levels 
listed in the table. The most noticeable feature of this table is that six estimated values are 
negative and six values are in excess of 1, demonstrating clearly the point made earlier that, 
although £ (V, | X;) is positive and less than 1, their estimators, need not be necessarily 
positive or less than 1. This is one reason that the LPM is not the recommended model 
when the dependent variable is dichotomous. 

Even if the estimated Y-, were all positive and less than 1, the LPM still suffers from the 
problem of heteroscedasticity, which can be seen readily from Eq. (15.2.8). As a conse¬ 
quence, we cannot trust the estimated standard errors reported in Eq. (15.2.10). (Why?) But 
we can use the weighted least-squares (WLS) procedure discussed earlier to obtain more 
efficient estimates of the standard errors. The necessary weights, w-,, required for the applica¬ 
tion of WLS are also shown in Table 15.2. But note that since some Y-, are negative and some 
are in excess of one, the w, corresponding to these values will be negative. Thus, we cannot 
use these observations in WLS (why?), thereby reducing the number of observations, from 
40 to 28 in the present example. 8 Omitting these observations, the WLS regression is 


Yi 

71 




(0.1206) 
t = (-10.332) 


0.1196-^= 

y/Wj 

(0.0069) 

(17.454) R 2 = 0.9214 


( 15 . 2 . 11 ) 


TABLE 15.2 Actual Y, Estimated Y, and Weights w,- for the Home Ownership Example 


Y, 

Yi 

W,* 


Y, 

Yi 

w* 


0 

-0.129* 



1 

1.301* 



1 

0.688 

0.2146 

0.4633 

1 

0.688 

0.2147 

0.4633 

1 

0.893 

0.0956 

0.3091 

0 

0.280 

0.2016 

0.4990 

0 

0.178 

0.1463 

0.3825 

0 

0.178 

0.1463 

0.3825 

0 

0.280 

0.2016 

0.4490 

1 

0.688 

0.2147 

0.4633 

1 

0.995 

0.00498 

0.0705 

0 

0.178 

0.1463 

0.3825 

1 

1.098* 



1 

1.097* 



0 

0.382 

0.2361 

0.4859 

1 

0.893 

0.0956 

0.3091 

0 

—0.0265' 



0 

0.178 

0.1463 

0.3825 

0 

0.076 

0.0702 

0.2650 

0 

0.076 

0.0702 

0.2650 

1 

0.791 

0.1653 

0.4066 

1 

0.791 

0.1653 

0.4055 

1 

0.893 

0.0956 

0.3091 

0 

0.382 

0.2361 

0.4859 

0 

0.484 

0.2497 

0.4997 

1 

1.199* 



1 

1.097* 



1 

1.097* 



0 

-0.333* 



0 

0.178 

0.1463 

0.3825 

1 

0.995 

0.00498 

0.0705 

0 

-0.129* 



1 

0.688 

0.2147 

0.4633 

1 

0.791 

0.1653 

0.4066 

0 

0.076 

0.0702 

0.2650 

1 

0.688 

0.2147 

0.4633 

0 

-0.129* 



0 

-0.231* 



1 

0.893 

0.0956 

0.3091 

1 

0.791 

0.1653 

0.4066 

* Treated as 

zero to avoid 

probabilities being negative. 






Treated as 
- Yi) 

unity to avoid 

probabilities exceeding one. 







8 To avoid the loss of the degrees of freedom, we could let Y, = 0.01 when the estimated V; are nega¬ 
tive and Y, = 0.99 when they are in excess of or equal to 1. See Exercise 15.1. 
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EXAMPLE 15.1 These results show that, compared with Eq. (15.2.10), the estimated standard errors are 

( Continued) smaller and, correspondingly, the estimated t ratios (in absolute value) are larger. But one 

should take this result with a grain of salt since in estimating Eq. (15.2.11) we had to drop 
12 observations. Also, since w, are estimated, the usual statistical hypothesis-testing pro¬ 
cedures are, strictly speaking, valid in the large samples (see Chapter 11). 


15.3 Applications of LPM 

Until the availability of readily accessible computer packages to estimate the logit and pro¬ 
bit models (to be discussed shortly), the LPM was used quite extensively because of its 
simplicity. We now illustrate some of these applications. 

EXAMPLE 15.2 In a study prepared for the U.S. Department of Labor, Cohen, Rea, and Lerman were in- 
Cohen—Rea— terested in examining the labor-force participation of various categories of labor as a 

Lerman Stud 9 function of several socioeconomic-demographic variables. In all their regressions, the 

y dependent variable is a dummy, taking a value of 1 if a person is in the labor force, 0 if 

he or she is not. In Table 15.3 we reproduce one of their several dummy-dependent vari¬ 
able regressions. 

Before interpreting the results, note these features: The preceding regression was es¬ 
timated by using the OLS. To correct for heteroscedasticity, the authors used the two- 
step procedure outlined previously in some of their regressions but found that the 
standard errors of the estimates thus obtained did not differ materially from those ob¬ 
tained without correction for heteroscedasticity. Perhaps this result is due to the sheer 
size of the sample, namely, about 25,000. Because of this large sample size, the esti¬ 
mated t values may be tested for statistical significance by the usual OLS procedure even 
though the error term takes dichotomous values. The estimated R 2 of 0.175 may seem 
rather low, but in view of the large sample size, this R 2 is still significant on the basis of 
the F test (See Section 8.4). Finally, notice how the authors have blended quantitative 
and qualitative variables and how they have taken into account the interaction effects. 

Turning to the interpretations of the findings, we see that each slope coefficient gives 
the rate of change in the conditional probability of the event occurring for a given unit 
change in the value of the explanatory variable. For instance, the coefficient of -0.2753 
attached to the variable "age 65 and over" means, holding all other factors constant, the 
probability of participation in the labor force by women in this age group is smaller by 
about 27 percent (as compared with the base category of women aged 22 to 54). By the 
same token, the coefficient of 0.3061 attached to the variable "16 or more years of 
schooling" means, holding all other factors constant, the probability of women with this 
much education participating in the labor force is higher by about 31 percent (as com¬ 
pared with women with less than 5 years of schooling, the base category). 

Now consider the interaction term marital status and age. The table shows that the 
labor-force participation probability is higher by some 29 percent for those women who 
were never married (as compared with the base category) and smaller by about 28 per¬ 
cent for those women who are 65 and over (again in relation to the base category). But 
the probability of participation of women who were never married and are 65 or over is 
smaller by about 20 percent as compared with the base category. This implies that women 
aged 65 and over but never married are likely to participate in the labor force more than 
those who are aged 65 and over and are married or fall into the "other" category. 

( Continued) 


9 Malcolm S. Cohen, Samuel A. Rea, Jr., and Robert I. Lerman, A Micro Model of Labor Supply, BLS Staff 
Paper 4, U.S. Department of Labor, 1970. 




550 Part Three Topics in Econometrics 


EXAMPLE 15.2 

(Continued) 


TABLE 15.3 Labor-Force Participation 

Regression of women, age 22 and over, living in largest 96 standard metropolitan statistical 
areas (SMSA) (dependent variable: in or out of labor force during 1966) 


Explanatory Variable 

Constant 
Marital status 
Married, spouse present 
Married, other 
Never married 

9 22-54 

55-64 

Years of schooling 
0^1 
5-8 
9-11 
12-15 
16 and over 

Unemployment rate (1966), % 
Under 2.5 

2.5- 3.4 

3.5- 4.0 


5.1 and over 

Employment change (1965-1966), % 
Under 3.5 
3.5-6.49 


Relative employment opportunities, % 
Under 62 
62-73.9 
74 and over 
FILOW, $ 

Less than 1,500 and negative 

1,500-7,499 

7,500 and over 


Interaction (marital 
Marital status 
Other 
Other 

Never married 
Never married 


status and age) 
9 55-64 
55-64 


Interaction (age and years of schooling completed) 
Age Years of schooling 

65 and over 5-8 

65 and over 9-11 

65 and over 12-15 

65 and over 16 and over 


No. of observations = 25,153 


I 2 = 0.175 


Coefficient 

0.4368 

0.1523 

0.2915 


-0.0594 

-0.2753 


0.1255 
0.1704 
0.2231 
0.3061 


-0.0213 

-0.0269 

-0.0291 

-0.0311 


0.0301 

0.0529 


0.0381 

0.0571 


-0.1451 

-0.2455 


-0.0406 

-0.1391 

-0.1104 

-0.2045 


-0.0885 

-0.0848 

-0.1288 

-0.1628 


-7.4 

-3.3 

-6.4 


Note: — indicates the base or omitted category. 

FILOW: family income less own wage and salary income. 

Source: Malcolm S. Cohen, Samuel A. Rea, Jr., and Robert I. Lerman, A Micro Model of Labor Supply, BLS Staff Paper 4, 
U.S. Department of Labor, 1970, Table F-6, pp. 212-213. 
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EXAMPLE 15.2 

(Continued) 

Following this procedure, the reader can easily interpret the rest of the coefficients 
given in Table 15.3. From the given information, it is easy to obtain the estimates of the 
conditional probabilities of labor-force participation of the various categories. Thus, if we 
want to find the probability for married women (other), aged 22 to 54, with 12 to 15 
years of schooling, with an unemployment rate of 2.5 to 3.4 percent, employment 
change of 3.5 to 6.49 percent, relative employment opportunities of 74 percent and 
over, and with FILOW of $7,500 and over, we obtain 

0.4368 + 0.1523 + 0.2231 - 0.021 3 + 0.0301 + 0.0571 - 0.2455 = 0.6326 

In other words, the probability of labor-force participation by women with the preced¬ 
ing characteristics is estimated to be about 63 percent. 


EXAMPLE 15.3 

Predicting a 

Bond Rating 

Based on a pooled time series and cross-sectional data of 200 Aa (high-quality) and Baa 
(medium-quality) bonds over the period 1961-1966, Joseph Cappelleri estimated the 
following bond rating prediction model. 10 

Yi =fr+lh 4 + ft Xii + ft + ft ^ + u ' 

where Y-, = 1 if the bond rating is Aa (Moody's rating) 

= 0 if the bond rating is Baa (Moody's rating) 

X 2 = debt capitalization ratio, a measure of leverage 
dollar value of long-term debt ^ 

dollar value of total capitalization 

X3 = profit rate 

dollar value of after-tax income „ „„ 
dollar value of net total assets 

X 4 = standard deviation of the profit rate, a measure of profit rate variability 

X5 = net total assets (thousands of dollars), a measure of size 

A priori, f} 2 and fin, are expected to be negative (why?) and /S3 and (is are expected to be 
positive. 

After correcting for heteroscedasticity and first-order autocorrelation, Cappelleri ob¬ 
tained the following results: 11 

Y, = 0.6860 - 0.01 79X|, + 0.0486X 3/ + 0.0572X 4 ; + 0.378(£-7)X 5 

(0.1775) (0.0024) (0.0486) (0.0178) (0.039)(£-8) ( 15 . 3 . 1 ) 

R 2 = 0.6933 

Note: 0.378 (£-7) means 0.0000000378, etc. 

All but the coefficient of X 4 have the correct signs. It is left to finance students to ra¬ 
tionalize why the profit rate variability coefficient has a positive sign, for one would ex¬ 
pect that the greater the variability in profits, the less likely it is Moody's would give an 

Aa rating, other things remaining the same. 

The interpretation of the regression is straightforward. For example, 0.0486 attached 
to X3 means that, other things being the same, a 1 percentage point increase in the 
profit rate will lead on average to about a 0.05 increase in the probability of a bond get¬ 
ting the Aa rating. Similarly, the higher the squared leveraged ratio, the lower by 0.02 is 
the probability of a bond being classified as an Aa bond per unit increase in this ratio. 


10 Joseph Cappelleri, "Predicting a Bond Rating," unpublished term paper, C.U.N.Y. The model used in 
the paper is a modification of the model used by Thomas F. Pogue and Robert M. Soldofsky, "What Is 
in a Bond Rating?" journal of Financial and Quantitative Analysis, June 1969, pp. 201-228. 

"Some of the estimated probabilities before correcting for heteroscedasticity were negative and 
some were in excess of 1; in these cases they were assumed to be 0.01 and 0.99, respectively, to 
facilitate the computation of the weights Wj. 
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EXAMPLE 15.4 Like credit cards, debit cards are now used extensively by consumers. Vendors prefer 

Who Holds a them because when you use a debit card, the amount of your purchase is automatically 

Debit Card? deducted from your checking or other designated account. To find out what factors de¬ 

termine the use of the debit card, we obtained data on 60 customers and considered the 
following model: 12 

I = ft + fhXii + ft X 3 , + ftX 4/ + Ui 

where Y = 1 for debit card holder, 0 otherwise; X 2 = account balance in dollars; X 3 = 
number of ATM transactions; X 4 = 1 if interest is received on the account, 0 otherwise. 

Since the linear probability model (LPM) exhibits heteroscedasticity, we present the 
usual OLS results and the OLS results corrected for heteroscedasticity in a tabular form. 


Variable 

Coefficient 

Coefficient* 

Constant 

0.3631 

0.3631 


(0.1796)** 

(0.1604)** 

Balance 

0.00028** 

0.00028** 


(0.00015) 

(0.00014) 

ATM 

-0.0269 

-0.0269 


(0.208) 

(0.0202) 

Interest 

-0.3019** 

-0.3019** 


(0.1448) 

(0.1 353) 

R * 2 

0.1056 

(0.1056) 


Note: *denotes heteroscedasticity-corrected standard errors. 

**significant at about 5% level. 

As these results show, those who have higher account balances will tend to hold a debit 
card. The higher the interest rate paid on account balances, the less the tendency to hold 
a debit card. Although the ATM variable is not significant, note that it has a negative 
sign. This is perhaps due to ATM transaction fees. 

There is not a vast difference between the estimated standard errors with and with¬ 
out heteroscedasticity correction. To save space, we have not presented the fitted values 
(i.e., the estimated probabilities), but they all were within the limits of 0 and 1. However, 
there is no guarantee that this will happen in every case. 


15.4 Alternatives to LPM 


As we have seen, the LPM is plagued by several problems, such as (1) non-normality of w,-, 

(2) heteroscedasticity of (3) possibility of 7, lying outside the 0-1 range, and (4) the 
generally lower R 2 values. But these problems are surmountable. For example, we can 
use WLS to resolve the heteroscedasticity problem or increase the sample size to minimize 
the non-normality problem. By resorting to restricted least-squares or mathematical pro¬ 
gramming techniques we can even make the estimated probabilities lie in the 0-1 interval. 

But even then the fundamental problem with the LPM is that it is not logically a very at¬ 
tractive model because it assumes that P, — E(Y — I | A) increases linearly with A, that is, 
the marginal or incremental effect of X remains constant throughout. Thus, in our home 
ownership example we found that as X increases by a unit ($1,000), the probability of 

12 The data used in the analysis are obtained from Douglas A. Lind, William C. Marchal, and Robert D. 
Mason, Statistical Techniques in Business and Economics, 11th Ed., McGraw-Hill, 2002, Appendix N, 
pp. 775-776. We have not used all the variables used by the authors. 
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FIGURE 15.2 

A cumulative 
distribution function 
(CDF). 


owning a house increases by the same constant amount of 0.10. This is so whether the in¬ 
come level is $8,000, $10,000, $18,000, or $22,000. This seems patently unrealistic. In re¬ 
ality one would expect that P, is nonlinearly related to X,: At very low income a family will 
not own a house but at a sufficiently high level of income, say, A*, it most likely will own a 
house. Any increase in income beyond X* will have little effect on the probability of own¬ 
ing a house. Thus, at both ends of the income distribution, the probability of owning a 
house will be virtually unaffected by a small increase in X. 

Therefore, what we need is a (probability) model that has these two features: (1) As X, 
increases, P, — E(Y — 1 | X) increases but never steps outside the 0-1 interval, and 
(2) the relationship between P, and X, is nonlinear, that is, “one which approaches zero at 
slower and slower rates as X,- gets small and approaches one at slower and slower rates as 
X t gets very large.” 13 

Geometrically, the model we want would look something like Figure 15.2. Notice in this 
model that the probability lies between 0 and 1 and that it varies nonlinearly withX. 

The reader will realize that the sigmoid, or S-shaped, curve in the figure very much resem¬ 
bles the cumulative distribution function (CDF) of a random variable. 14 Therefore, one can 
easily use the CDF to model regressions where the response variable is dichotomous, taking 
0-1 values. The practical question now is, which CDF? For although all CDFs are S shaped, 
for each random variable there is a unique CDF. For historical as well as practical reasons, the 
CDFs commonly chosen to represent the 0-1 response models are (1) the logistic and (2) the 
normal, the former giving rise to the logit model and the latter to the probit (or normit) model. 

Although a detailed discussion of the logit and probit models is beyond the scope of this 
book, we will indicate somewhat informally how one estimates such models and how one 
interprets them. 



15.5 The Logit Model 

We will continue with our home ownership example to explain the basic ideas underlying 
the logit model. Recall that in explaining home ownership in relation to income, the LPM was 

Pi = Pi + hXi (15.5.1) 


13 John Aldrich and Forrest Nelson, op. cit., p. 26. 

14 As discussed in Appendix A, the CDF of a random variable X is simply the probability that it takes 
a value less than or equal to xq, where *o is some specified numerical value of X. In short, F(X), the 
CDF of X, is F(X = x 0 ) = P(X < x 0 ). 
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where X is income and P, = E(Yj — 1 \X t ) means the family owns a house. But now con¬ 
sider the following representation of home ownership: 


11 = 


1 + e -(f> i+fhXi) 


( 15 . 5 . 2 ) 


For ease of exposition, we write Eq. (15.5.2) 
i 

Pi = 


1 + e~ Zl 1 + e z 


( 15 . 5 . 3 ) 


where Z, = + ftjX,. 

Equation (15.5.3) represents what is known as the (cumulative) logistic distribution 
function . 15 

It is easy to verify that as Z, ranges from — oo to +oo, P, ranges between 0 and 1 and 
that P, is nonlinearly related to Z, (i.e., X t ), thus satisfying the two requirements consid¬ 
ered earlier. 16 But it seems that in satisfying these requirements, we have created an 
estimation problem because is nonlinear not only in X but also in the ’s as can be seen 
clearly from Eq. (15.5.2). This means that we cannot use the familiar OLS procedure to es¬ 
timate the parameters. 17 But this problem is more apparent than real because Eq. (15.5.2) 
can be linearized, which can be shown as follows. 

If Pi, the probability of owning a house, is given by Eq. (15.5.3), then (1 — Pi), the 
probability of not owning a house, is 


Therefore, we can write 


Pi 

1 - Pi 


( 15 . 5 . 4 ) 


( 15 . 5 . 5 ) 


Now P;/(1 — P{) is simply the odds ratio in favor of owning a house—the ratio of the 
probability that a family will own a house to the probability that it will not own a house. 
Thus, if P, = 0.8, it means that odds are 4 to 1 in favor of the family owning a house. 

Now if we take the natural log of Eq. (15.5.5), we obtain a very interesting result, 
namely, 



= A + hXi 


15 The logistic model has been used extensively in analyzing growth phenomena, such as population, 
CNP, money supply, etc. For theoretical and practical details of logit and probit models, see J. S. Kramer, 
The Logit Model for Economists, Edward Arnold Publishers, London, 1991; and C. S. Maddala, op. cit. 
16 Note that as Z; -* + 00 , e _Zf tends to zero and as Z,- — 00 , e~ z ‘ increases indefinitely. Recall that 

e = 2.71828. 

17 Of course, one could use nonlinear estimation techniques discussed in Chapter 14. See also 
Section 15.8. 
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that is, L, the log of the odds ratio, is not only linear in X, but also (from the estimation 
viewpoint) linear in the parameters. 18 L is called the logit, and hence the name logit model 
for models like Eq. (15.5.6). 

Notice these features of the logit model. 

1. As P goes from 0 to 1 (i.e., as Z varies from — oo to +oo), the logit L goes from —oo 
to +oo. That is, although the probabilities (of necessity) lie between 0 and 1, the logits are 
not so bounded. 

2. Although L is linear in X, the probabilities themselves are not. This property is in 
contrast with the LPM model (15.5.1) where the probabilities increase linearly with X} 9 

3. Although we have included only a single X variable, or regressor, in the preceding 
model, one can add as many regressors as may be dictated by the underlying theory. 

4. If L, the logit, is positive, it means that when the value of the regressor(s) increases, 
the odds that the regressand equals 1 (meaning some event of interest happens) increases. 
If L is negative, the odds that the regressand equals 1 decreases as the value of X increases. 
To put it differently, the logit becomes negative and increasingly large in magnitude as the 
odds ratio decreases from 1 to 0 and becomes increasingly large and positive as the odds 
ratio increases from 1 to infinity. 20 

5. More formally, the interpretation of the logit model given in Eq. (15.5.6) is as follows: 
P 2, the slope, measures the change in L for a unit change in X, that is, it tells how the log- 
odds in favor of owning a house change as income changes by a unit, say, $1,000. The 
intercept is the value of the log-odds in favor of owning a house if income is zero. Like 
most interpretations of intercepts, this interpretation may not have any physical meaning. 

6. Given a certain level of income, say, X", if we actually want to estimate not the odds 
in favor of owning a house but the probability of owning a house itself, this can be done di¬ 
rectly from Eq. (15.5.3) once the estimates of fi\ and /+ are available. This, however, raises 
the most important question: How do we estimate P\ and /+ in the first place? The answer 
is given in the next section. 

7. Whereas the LPM assumes that P, is linearly related to X t , the logit model assumes 
that the log of the odds ratio is linearly related to X,. 


15.6 Estimation of the Logit Model 


For estimation purposes, we write Eq. (15.5.6) as follows: 



( 15 . 6 . 1 ) 


We will discuss the properties of the stochastic error term u, shortly. 


18 Recall that the linearity assumption of OLS does not require that the X variable be necessarily linear. 
So we can have X 2 , X 3 , etc., as regressors in the model. For our purpose, it is linearity in the parame¬ 
ters that is crucial. 

19 Using calculus, it can be shown that dP/dX = ftPfl — P), which shows that the rate of change in 
probability with respect to X involves not only p 2 but also the level of probability from which the 
change is measured (but more on this in Section 15.7). In passing, note that the effect of a unit 
change in X/ on P is greatest when P = 0.5 and least when P is close to 0 or 1. 

20 This point is due to David Carson. 
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To estimate Eq. (15.6.1), we need, apart from X u the values of the regressand, or logit, 
L t . This depends on the type of data we have for analysis. We distinguish two types of data: 
(1) data at the individual, or micro, level, and (2) grouped or replicated data. 

Data at the Individual Level 

If we have data on individual families, as in the case of Table 15.1, OLS estimation of 
Eq. (15.6.1) is infeasible. This is easy to see. In terms of the data given in Table 15.1, 
Pj — 1 if a family owns a house and P, = 0 if it does not own a house. But if we put these 
values directly into the logit L,, we obtain: 


Obviously, these expressions are meaningless. Therefore, if we have data at the micro, or 
individual, level, we cannot estimate Eq. (15.6.1) by the standard OLS routine. In this 
situation we may have to resort to the maximum-likelihood (ML) method to estimate the 
parameters. Although the rudiments of this method were discussed in the appendix to 
Chapter 4, its application in the present context will be discussed in Appendix 15 A, Sec¬ 
tion 15 A. 1, for the benefit of readers who would like to learn more about it. 21 Software pack¬ 
ages, such as MICROFIT, EViews, LIMDEP, SHAZAM, PC-GIVE, STATA, and MINITAB, 
have built-in routines to estimate the logit model at the individual level. We will illustrate 
the use of the ML method later in the chapter. 


if a family own a house 

if a family does not own a house 


Grouped or Replicated Data 

Now consider the data given in Table 15.4. This table gives data on several families grouped 
or replicated (repeat observations) according to income level and the number of families 
owning a house at each income level. Corresponding to each income level X t , there are A, 
families, n, among whom are home owners (n, < A;). Therefore, if we compute 

Z = < 15 - 6 - 2 > 


TABLE 15.4 
Hypothetical Data on 
Xi (Income), A; 
(Number of Families 
at Income Xf), and 
(Number of Families 
Owning a House) 


X 



(thousands of dollars) 

Ni 

n, 

6 

40 

8 

8 

50 

12 

10 

60 

18 

13 

80 

28 

15 

100 

45 

20 

70 

36 

25 

65 

39 

30 

50 

33 

35 

40 

30 

40 

25 

20 


21 For a comparatively simple discussion of maximum likelihood in the context of the logit model, see 
John Aldrich and Forrest Nelson, op. cit., pp. 49-54. See also, Alfred Demarsi, Logit Modeling: Practical 
Applications, Sage Publications, Newbury Park, Calif., 1992. 
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that is, the relative frequency, we can use it as an estimate of the true P, corresponding to 
each X ,. If N t is fairly large, P t will be a reasonably good estimate of P, , 22 Using the esti¬ 
mated Pj, we can obtain the estimated logit as 


Li 



= ft + ftX; 


(15.6.3) 


which will be a fairly good estimate of the true logit Z, if the number of observations TV, at 
each X, is reasonably large. 

In short, given the grouped or replicated data, such as Table 15.4, one can obtain the data 
on the dependent variable, the logits, to estimate the model (15.6.1). Can we then apply 
OLS to Eq. (15.6.3) and estimate the parameters in the usual fashion? The answer is, not 
quite, since we have not yet said anything about the properties of the stochastic disturbance 
term. It can be shown that if A) is fairly large and if each observation in a given income 
class Xi is distributed independently as a binomial variable, then 


’ NiPi( 1 - ft)J 


(15.6.4) 


that is, lit follows the normal distribution with zero mean and variance equal to 
\/[N i P l {\ - Pj )]. 23 

Therefore, as in the case of the LPM, the disturbance term in the logit model is het- 
eroscedastic. Thus, instead of using OLS we will have to use the weighted least squares 
(WLS). For empirical purposes, however, we will replace the unknown P t by and use 



NiPi( 1 - Pi) 


(15.6.5) 


as estimator of a 2 . 

We now describe the various steps in estimating the logit regression in Eq. (15.6.1): 


1. For each income level X, compute the probability of owning a house as Pj — Hi /Ni. 

2. For each X, , obtain the logit as 24 


U = In [A/(l - Pi)] 


3. To resolve the problem of heteroscedasticity, transform Eq. (15.6.1) as follows: 25 

JWiLt = ft JWi + ft JwtXi + JwiUi (15.6.6) 


22 From elementary statistics recall that the probability of an event is the limit of the relative frequency 
as the sample size becomes infinitely large. 

23 As shown in elementary probability theory, Pj, the proportion of successes (here, owning a house), 
follows the binomial distribution with mean equal to true P, and variance equal to P,(1 — Pi)/Nr, and 
as N, increases indefinitely the binomial distribution approximates the normal distribution. The distri¬ 
butional properties of ui given in Eq. (15.6.4) follow from this basic theory. For details, see Henry 
Theil, "On the Relationships Involving Qualitative Variables," American journal of Sociology, vol. 76, 

July 1970, pp. 103-154. 

24 Since P; = ni/Nj, L / can be alternatively expressed as Li = In a,/(N, — rj;). In passing it should 
be noted that to avoid Pj taking the value of 0 or 1, in practice L, is measured as £/ = In (n,- + \)/ 

(Nj - nj + j) = In (P,- + 1 /2N,)/(1 - P,- + 1 /2 Nj). It is recommended as a rule of thumb that Nj be 
at least 5 at each value of X,. For additional details, see D. R. Cox, Analysis of Binary Data, Methuen, 
London, 1970, p. 33. 

25 lf we estimate Eq. (15.6.1) disregarding heteroscedasticity, the estimators, although unbiased, will 
not be efficient, as we know from Chapter 11. 
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which we write 


LfmPi^/wi + ^+Vi (15.6.7) 

where the weights = N, P,(1 — P ; ); Z* = transformed or weighted Lp, X* — 
transformed or weighted X ,; and v, = transformed error term. It is easy to verify that 
the transformed error term v, is homoscedastic, keeping in mind that the original error 
variance is cr 2 = 1/[A^P,(1 - Pi)]. 

4. Estimate Eq. (15.6.6) by OLS—recall that WLS is OLS on the transformed data. 
Notice that in Eq. (15.6.6) there is no intercept term introduced explicitly (why?). 
Therefore, one will have to use the regression through the origin routine to estimate 
Eq. (15.6.6). 

5. Establish confidence intervals and/or test hypotheses in the usual OLS framework, but 
keep in mind that all the conclusions will be valid strictly speaking only if the sample is 
reasonably large (why?). Therefore, in small samples, the estimated results should be 
interpreted carefully. 


15.7 The Grouped Logit (Glogit) Model: A Numerical Example 

To illustrate the theory just discussed, we will use the data given in Table 15.4. Since the 
data in the table are grouped, the logit model based on this data will be called a grouped 
logit model, glogit, for short. The necessary raw data and other relevant calculations neces¬ 
sary to implement glogit are given in Table 15.5. The results of the weighted least-squares 
regression (15.6.7) based on the data given in Table 15.5 are as follows: Note that there 
is no intercept in Eq. (15.6.7); hence the regression-through-the-origin procedure is 
appropriate here. 

Z* = -1.594740^+ 0.07862A* 

se = (0.11046) (0.00539) (15.7.1) 

t = (-14.43619) (14.56675) R 2 = 0.9642 

The R 2 is the squared correlation coefficient between actual and estimated L*.L* and X* 
are weighted Z, and X h as shown in Eq. (15.6.6). Although we have shown the calculations 
of the grouped logit in Table 15.5 for pedagogical reasons, this can be done easily by in¬ 
voking the glogit (grouped logit) command in STATA. 

Interpretation of the Estimated Logit Model 

How do we interpret Eq. (15.7.1)? There are various ways, some intuitive and some not: 
Logit Interpretation 

As Eq. (15.7.1) shows, the estimated slope coefficient suggests that for a unit ($1,000) in¬ 
crease in weighted income, the weighted log of the odds in favor of owning a house goes up 
by 0.08 emits. This mechanical interpretation, however, is not very appealing. 

Odds Interpretation 

Remember that L, = In [7)7(1 — /))]. Therefore, taking the antilog of the estimated 
logit, we get P, /(1 - P t ), that is, the odds ratio. Hence, taking the antilog of Eq. (15.7.1), 
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we obtain: 


Pi 

1-4 


1 . 594747 ^+ 0 . 07862 ^ 


j-1.594740i7 . g 0.07862Jr; 


(15.7.2) 


Using a calculator, you can easily verify that e 007862 = 1.08 1 7. This means that for a unit 
increase in weighted income, the (weighted) odds in favor of owning a house increases by 
1.0817 or about 8.17 percent. In general, if you take the antilog of the jth slope coefficient 
(in case there is more than one regressor in the model), subtract 1 from it, and multiply the 
result by 100, you will get the percent change in the odds for a unit increase in the jth 
regressor. 

Incidentally, if you want to carry the analysis in terms of unweighted logit, all you 
have to do is divide the estimated L* by ffiw,. Table 15.6 gives the estimated weighted 
and unweighted logits for each observation and some other data, which we will discuss 
shortly. 

Computing Probabilities 

Since the language of logit and odds ratio may be unfamiliar to some, we can always com¬ 
pute the probability of owning a house at a certain level of income. Suppose we want to 
compute this probability atX = 20 ($20,000). Plugging this value into Eq. (15.7.1), we ob¬ 
tain: L* = —0.09311 and dividing this by ffiwt =4.1816 (see Table 15.5), we obtain 
Lj = —0.02226. Therefore, at the income level of $20,000, we have 



Therefore, 

—= e -° 02m = 0.97825 
1 - Pi 

Solving this for 


£—0.02199 


TABLE 15.6 

Lstar, Xstar, 
Estimated Lstar, 



Pi 

1 +e~° 

.02199 


Lstar 

Xstar 

ELstar 

Logit 

Probability, 

P 

Change in Probability* 

Probability, and 

-3.50710 

15.1788 

-2.84096 

-1.12299 

0.24545 

0.01456 

Change in 

-3.48070 

24.15920 

-2.91648 

-0.96575 

0.27572 

0.01570 

Probability* 

-3.48070 

35.49600 

-2.86988 

-0.80850 

0.30821 

0.01676 

-2.64070 

55.45930 

-2.44293 

-0.57263 

0.36063 

0.01813 


-0.99850 

74.62350 

-2.06652 

-0.41538 

0.39762 

0.01883 


0.16730 

83.65060 

-0.09311 

-0.02226 

0.49443 

0.01965 


1.60120 

98.74250 

1.46472 

0.37984 

0.59166 

0.01899 


2.22118 

100.48800 

2.55896 

0.76396 

0.68221 

0.01 704 


3.00860 

95.84050 

3.16794 

1.15677 

0.76074 

0.01431 


2.77260 

80.00000 

3.10038 

1.55019 

0.82494 

0.01135 


d Xstar are from Table 15.5. ELstar is the estimated Lstar. Logit is 
ty of owning a house. Change in probability is the change per unit < 

sd from - P) = 0.07862P(1 - P). 


veighted logit. Probability is 





FIGURE 15.3 

Change in probability 
in relation to income. 
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the reader can see that the estimated probability is 0.4945. That is, given the income of 
$20,000, the probability of a family owning a house is about 49 percent. Table 15.6 shows 
the probabilities thus computed at various income levels. As this table shows, the proba¬ 
bility of house ownership increases with income, but not linearly as with the LPM model. 

Computing the Rate of Change of Probability 

As you can gather from Table 15.6, the probability of owning a house depends on the in¬ 
come level. How can we compute the rate of change of probabilities as income varies? As 
noted in footnote 19, that depends not only on the estimated slope coefficient bi but also on 
the level of the probability from which the change is measured; the latter of course depends 
on the income level at which the probability is computed. 

To illustrate, suppose we want to measure the change in the probability of owning a house 
at the income level $20,000. Then, from footnote 19 the change in probability for a unit in¬ 
crease in income from the level 20 (thousand) is: >0(1 — P)P = 0.07862(0.5056)(0.4944) m 
0.01965. 

It is left as an exercise for the reader to show that at income level $40,000, the change in 
probability is 0.01135. Table 15.6 shows the change in probability of owning a house at var¬ 
ious income levels; these probabilities are also depicted in Figure 15.3. 

To conclude our discussion of the glogit model, we present the results based on OLS, or 
unweighted regression, for the home ownership example: 

Li = -1.6587 + 0.0792X, 

se= (0.0958) (0.0041) (15.7.3) 

t = (-17.32) (19.11) r 2 = 0.9786 

We leave it to the reader to compare this regression with the weighted least-squares regres¬ 
sion given by Eq. (15.7.1). 


15.8 The Logit Model for Ungrouped or Individual Data 


To set the stage, consider the data given in Table 15.7. Letting Y = 1 if a student’s final 
grade in an intermediate microeconomics course was A and Y = 0 if the final grade 
was a B or a C, Spector and Mazzeo used grade point average (GPA), TUCE, and 
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TABLE 1 5.7 Data on the Effect of Personalized System of Instruction (PSI) on Course Grades 


GPA TUCE Letter GPA TUCE Letter 

Observation Grade Grade PSI Grade Grade Observation Grade Grade PSI Grade Grade 


1 2.66 

2 2.89 

3 3.28 

4 2.92 

5 4.00 

6 2.86 

7 2.76 

8 2.87 

9 3.03 

10 3.92 

11 2.63 

12 3.32 

13 3.57 

14 3.26 

15 3.53 

16 2.74 


20 0 0 

22 0 0 

24 0 0 

12 0 0 

21 0 1 

17 0 0 

17 0 0 

21 0 0 

25 0 0 

29 0 1 

20 0 0 

23 0 0 

23 0 0 

25 0 1 

26 0 0 

19 0 0 


C 17 

B 18 

B 19 

B 20 

A 21 

B 22 

B 23 

B 24 

C 25 

A 26 

C 27 

B 28 

B 29 

A 30 

B 31 

B 32 


2.75 25 

2.83 19 

3.12 23 

3.16 25 

2.06 22 

3.62 28 

2.89 14 

3.51 26 

3.54 24 

2.83 27 

3.39 17 

2.67 24 

3.65 21 

4.00 23 

3.10 21 

2.39 19 


0 0 C 

0 0 C 

1 0 B 

1 1 A 

1 0 C 

1 1 A 

1 0 C 

1 0 B 

1 1 A 

1 1 A 

1 1 A 

1 0 B 

1 1 A 

1 1 A 

1 0 C 

1 1 A 


Notes: Grade Y = 1 if the final grade is A 

= 0 if the final grade is B or C 

TUCE = score on an examination given at the beginning of the term to test entering knowledge of macroeconomics 
PSI = 1 if the new teaching method is used 
= 0 otherwise 

GPA = the entering grade point average 

Source: L. Spector and M. Mazzeo, “Probit Analysis and Economic Education,” Journal of Economic Education, vol. 11, 1980, pp. 37-44. 


Personalized System of Instruction (PSI) as the grade predictors. The logit model here can 
be written as: 


Li 



= ft + ftGPA, + ft TUCE, + ftPSI, + Ui 


(15.8.1) 


As we noted in Section 15.6, we cannot simply put ft = 1 if a family owns a house, and 
zero if it does not own a house. Here neither OLS nor weighted least squares (WLS) is 
helpful. We have to resort to nonlinear estimating procedures using the method of maxi¬ 
mum likelihood. The details of this method are given in Appendix 15 A, Section 15A.1. 
Since most modern statistical packages have routines to estimate logit models on the basis 
of ungrouped data, we will present the results of model (15.8.1) using the data given in 
Table 15.7 and show how to interpret the results. The results are given in Table 15.8 in tab¬ 
ular form and are obtained by using EViews 6. Before interpreting these results, some gen¬ 
eral observations are in order. 


1. Since we are using the method of maximum likelihood, which is generally a large- 
sample method, the estimated standard errors are asymptotic. 

2. As a result, instead of using the t statistic to evaluate the statistical significance of a 
coefficient, we use the (standard normal) Z statistic. So inferences are based on the normal 
table. Recall that if the sample size is reasonably large, the t distribution converges to the 
normal distribution. 

3. As noted earlier, the conventional measure of goodness of fit, R 2 , is not particularly 
meaningful in binary regressand models. Measures similar to R 2 , called pseudo R 2 , are 
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TABLE 15.8 

Regression Results of 
Equation (15.8.1) 


Dependent Variable: Grade 
Method: ML-Binary Logit 
Convergence achieved af|tf| 

6 5 iterations 


Variable 

Coefficient 

Std. Error 

Z Statistic 

Probability 

C 

-13.0213 

4.931 

-2.6405 

0,0082 

GPA 

2.8261 

1.2629 

2.2377 

0.0252 

TUCE 

0.0951 

0.1415 

0.67223 

1*5014 

PST 

2.3 7.8 6 

1,0 64 5 

2.2345 

0.0255 

McFadden R 2 s §. 3740 

LR stfflil 

Stic (3 df) = 15. 

.40419 


available, and there are a variety of them. 26 E Views presents one such measure, the McFadden 
R 2 , denoted by 7 ?m cF , whose value in our example is 0.3740. 27 Like R 2 , R^cf also ranges be¬ 
tween 0 and 1. Another comparatively simple measure of goodness of fit is the count R 2 , 
which is defined as: 

Count * 2 = number of correct predictions 
total number of observations 

Since the regressand in the logit model takes a value of 1 or zero, if the predicted prob¬ 
ability is greater than 0.5, we classify that as 1, but if it is less than 0.5, we classify that 
as 0. We then count the number of correct predictions and compute the R 2 as given in 
Eq. (15.8.2). We will illustrate this shortly. 

It should be noted, however, that in binary regressand models, goodness of fit is of sec¬ 
ondary importance. What matters is the expected signs of the regression coefficients and 
their statistical and/or practical significance. 

4. To test the null hypothesis that all the slope coefficients are simultaneously equal to 
zero, the equivalent of the F test in the linear regression model is the likelihood ratio (LR) 
statistic. Given the null hypothesis, the LR statistic follows the / 2 distribution with df 
equal to the number of explanatory variables, three in the present example. {Note: Exclude 
the intercept term in computing the df.) 

Now let us interpret the regression results given in Eq. (15.8.1). Each slope coefficient 
in this equation is a partial slope coefficient and measures the change in the estimated logit 
for a unit change in the value of the given regressor (holding other regressors constant). 
Thus, the GPA coefficient of 2.8261 means, with other variables held constant, that if GPA 
increases by a unit, on average the estimated logit increases by about 2.83 units, suggesting 
a positive relationship between the two. As you can see, all the other regressors have a pos¬ 
itive effect on the logit, although statistically the effect of TUCE is not significant. How¬ 
ever, together all the regressors have a significant impact on the final grade, as the LR 
statistic is 15.40 with a p value of about 0.0015, which is very small. 

As noted previously, a more meaningful interpretation is in terms of odds, which are 
obtained by taking the antilog of the various slope coefficients. Thus, if you take the antilog 
of the PSI coefficient of 2.3786 you will get 10.7897 (see 23786 ). This suggests that 

26 For an accessible discussion, see J. Scott Long, Regression Models for Categorical and Limited Depen¬ 
dent Variables, Sage Publications, Newbury Park, California, 1997, pp. 102-113. 

27 Technically, this is defined as: 1 — (LLF ur /LLF r ), where LLF ur is the unrestricted log likelihood function 
where all regressors are included in the model and LLF, is the restricted log likelihood function where 
only the intercept is included in the model. Conceptually, LLF ur is equivalent to RSS and LLF r is equiva¬ 
lent to TSS of the linear regression model. 
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TABLE 15.9 
Actual and Fitted 
Values Based on 
Regression in 
Table 15.8 


Observation Actual 
1 0 

2 0 

3 0 

4 0 

5 1 

6 0 

7 0 

8 0 

9 0 

10 1 

11 0 

12 0 

13 0 

*14 1 

15 0 

16 0 

17 0 

18 0 

*19 0 

20 1 

21 0 

22 1 

23 0 

*24 0 

25 1 

*26 1 

27 1 

28 0 

29 1 

30 1 

*31 0 

*32 1 


Fitted Residual 

0.02658 -0.02658 

0.05950 -0.05950 

0.18726 -0.18726 

0.02590 -0.02590 

0.56989 0.43011 

0.03486 -0.03486 

0.02650 -0.02650 

0.05156 -0.05156 

0.11113 -0.11113 

0.69351 0.30649 

0.02447 -0.02447 

0.19000 -0.19000 

0.32224 -0.32224 

0.19321 0.80679 

0.36099 -0.36099 

0.03018 -0.03018 

0.05363 -0.05363 

0.03859 -0.03859 

0.58987 -0.58987 

0.66079 0.33921 

0.06138 -0.06138 

0.90485 0.09515 

0.241 77 -0.241 77 

0.85209 -0.85209 

0.83829 0.16171 

0.48113 0.51887 

0.63542 0.36458 

0.30722 -0.30722 

0.84170 0.15830 

0.94534 0.05466 

0.52912 -0.52912 

0.11103 0.88897 


Residual Plot 







students who are exposed to the new method of teaching are more than 10 times as likely 
to get an A than students who are not exposed to it, other things remaining the same. 

Suppose we want to compute the actual probability of a student getting an A grade. Con¬ 
sider student number 10 in Table 15.7. Putting the actual data for this student in the esti¬ 
mated logit model given in Table 15.8, the reader can check that the estimated logit value 
for this student is 0.8178. Using Eq. (15.5.2), the reader can easily check that the estimated 
probability is 0.69351. Since this student’s actual final grade was an A, and since our logit 
model assigns a probability of 1 to a student who gets an A, the estimated probability of 
0.69351 is not exactly 1 but close to it. 

Recall the count if 2 defined earlier. Table 15.9 gives you the actual and predicted values 
of the regressand for our illustrative example. From this table you can observe that, out of 
32 observations, there were 6 incorrect predictions (students 14, 19, 24, 26, 31, and 32). 
Hence the count if 2 value is 26/32 = 0.8125, whereas the McFadden if 2 value is 0.3740. 
Although these two values are not directly comparable, they give you some idea about the 
orders of magnitude. Besides, one should not overplay the importance of goodness of fit in 
models where the regressand is dichotomous. 
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EXAMPLE 15.5 

Who Owns a 
Debit Card? 
Logit Analysis 


We have already seen the results of the linear probability model (LPM) applied to the bank 
debit card data, so let us see how the logit model does. The results are as follows: 

Dependent Variable: DEBIT 

Method: ML-Binary Logit (Quadratic hill climbing) 

Sample? 1-60 

Included observations: 60 

Convergence achieved after 4 iterations 

Covariance matrix computed using second derivatives 


Warlatole 


Balance 

ATM 

Interest 


SfctSi Error 


z-Statis' 




-0.57#*SjJ; 0.785787 
5.00124® 0.000697 
-0.120225 0.093984 
-1.352086 0.680988 


-0.731624 0.4644 

1.789897 0.073® 

-1.279205 0.2008 

-1.985478 0.0471 


McEadden E-squared 
S.D. dependent var. 
Akaike info criterion 
Schwarz criterion 
Hannan-Quinn criter. 
LR statistic 
Prob. (LR statistic) 


0.08045K, Mean dependent var. 

0.499717' S.E. of regress:: on 

1.391675 Sum squared resid. 

1.531298 Log likelihood 

1.446289 Restr. log likelihood 
6.607325 Avg. log likelihood 

0.085525 


0.433333 
0.486274 
13.24192 
-37.75024 
-41.05391 
0.629171 


Ohs. with Dep = 8 34 Total obs. 60 

Obs. with Dep =1 26 


The positive sign of Balance and the negative signs of ATM and Interest are similar to the 
LPM, although we cannot directly compare the two. The interpretation of the coefficients 
in the logit model is different from the LPM. Here, for example, if the interest rate goes up 
by 1 percentage point, the logit goes down by about 1.35, holding other variables con¬ 
stant. If we take the anti-log of -1.352086, we get about 0.2587. This means that if in¬ 
terest rate is paid on account balances, on average only about one-fourth of the customers 
are likely to hold debit cards. 

From the estimated LR statistic we see that collectively the three variables are statisti¬ 
cally significant at about the 8.5 percent level. If we use the conventional 5 percent 
significance level, then these variables are only marginally significant. 

The McFadden R 2 value is quite low. Using the data, the reader can find out the value 
of the count R 2 . 

As noted earlier, unlike the LPM, the slope coefficients do not give us the rate of change 
of probability for a unit change in the regressor. We have to calculate them as shown in 
Table 15.6. Fortunately, this manual task is not necessary, for statistical packages like STATA 
can do this routinely. For our example, the results are as follows: 

Marginal effects after logit 


Y = Pr(debit) (predict) 
= .42512423 


Variable 

1 dy/dx 

Std. Error 

* pjMixl 

[ 15% C.I. ] 

X 

Balance 

Interest* 

ATM 

] .000305 

1 -.29*3972 

1 -.02*3822 

. 00017 
. 1293L9 
.02397 

1.79 0.073 

-2.32 -tS.Oaibw 
-1.21; H..3Sg . 

-.0M029: .000639 
-.JgfStf -.046199 
-.0743S6 .015631 

1499.87 

.26666V 

10.3 

*dy/dx is for dis 

icrete change of dummy ve 

triable from 0 to 1. 



(Continued) 
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EXAMPLE 15.5 The coefficient of 0.000305 suggests that customers with higher balances have a 
(, Continued ) 0.03 P ercent higher probability of owning a debit card, but if the interest rate goes up 

by 1 percentage point, the probability of owning a debit card goes down by about 30 per¬ 
cent. The coefficient of ATM, although statistically insignificant, suggests that if ATM 
transactions go up by a unit, the probability of owning a debit card goes down by about 
2.9 percent. 


15.9 The Probit Model 


As we have noted, to explain the behavior of a dichotomous dependent variable we will 
have to use a suitably chosen cumulative distribution function (CDF). The logit model uses 
the cumulative logistic function, as shown in Eq. (15.5.2). But this is not the only CDF that 
one can use. In some applications, the normal CDF has been found useful. The estimating 
model that emerges from the normal CDF 28 is popularly known as the probit model, 
although sometimes it is also known as the normit model. In principle one could substitute 
the normal CDF in place of the logistic CDF in Eq. (15.5.2) and proceed as in Section 16.5. 
Instead of following this route, we will present the probit model based on utility theory, or 
rational choice perspective on behavior, as developed by McFadden. 29 

To motivate the probit model, assume that in our home ownership example the decision 
of the z'th family to own a house or not depends on an unobservable utility index 7, (also 
known as a latent variable), that is determined by one or more explanatory variables, say 
income X, , in such a way that the larger the value of the index /,, the greater the probability 
of a family owning a house. We express the index /, as 

h = Pi + fox, (15.9.1) 

where X l is the income of the z'th family. 

How is the (unobservable) index related to the actual decision to own a house? As 
before, let Y — 1 if the family owns a house and Y — 0 if it does not. Now it is reasonable 
to assume that there is a critical or threshold level of the index, call it I*, such that if /, 
exceeds I*, the family will own a house, otherwise it will not. The threshold I*, like , is 
not observable, but if we assume that it is normally distributed with the same mean and 
variance, it is possible not only to estimate the parameters of the index given in Eq. (15.9.1) 
but also to get some information about the unobservable index itself. This calculation is as 
follows. 

Given the assumption of normality, the probability that I* is less than or equal to /, can 
be computed from the standardized normal CDF as: 30 

Pi = P(Y = 11 X) = P(J* < /,) = P(Z, < pi + p 2 Xi) = F(Pi + p 2 Xi) 

(15.9.2) 


28 See Appendix A for a discussion of the normal CDF. Briefly, if a variable X follows the normal 
distribution with mean p and variance a 2 , its PDF is 


and its CDF is 


= f*° - t „-(X-m) 2 /2 t 2 

J-oo 'J2a 2 n 


where Xo is some specified value of X. 

29 D. McFadden, "Conditional Logit Analysis of Qualitative Choice Behavior," in P. Zarembka (ed.), 
Frontiers in Econometrics, Academic Press, New York, 1973. 

30 A normal distribution with zero mean and unit (= 1) variance is known as a standard or 
standardized normal variable (see Appendix A). 
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FIGURE 15.4 ft = F(ii) Pi = FQi) 

Probit model: (a) given 1 1 


7 fo read ft from the 
ordinate; ( b ) given ft, 
read /, from the 
abscissa. 

Pi 
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< 1 
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where ft( Y = 1 X) means the probability that an event occurs given the value(s) of the X, 
or explanatory, variable(s) and where Z, is the standard normal variable, i.e., 
Z ~ N(0, a 2 ). F is the standard normal CDF, which written explicitly in the present 
context is: 


F(It) = 


1 

\/2jr 

1 



' 2 dz 

e-* 2 l 2 dz 


(15.9.3) 


Since P represents the probability that an event will occur, here the probability of own¬ 
ing a house, it is measured by the area of the standard normal curve from -oo to 7, as 
shown in Figure 15.4a. 

Now to obtain information on , the utility index, as well as on ft and ft, we take the 
inverse of Eq. (15.9.2) to obtain: 


7, = F~\li) = F~ l (Pj ) 
= ft+ftft- 


(15.9.4) 


where F _1 is the inverse of the normal CDF. What all this means can be made clear from 
Figure 15.4. In panel (a) of this figure we obtain from the ordinate the (cumulative) proba¬ 
bility of owning a house given I* < , whereas in panel (b) we obtain from the abscissa the 

value of It given the value of P,, which is simply the reverse of the former. 

But how do we actually go about obtaining the index I as well as estimating fi\ and ft? 
As in the case of the logit model, the answer depends on whether we have grouped data or 
ungrouped data. We consider the two cases individually. 


Probit Estimation with Grouped Data: gprobit 

We will use the same data that we used for glogit, which is given in Table 15.4. Since we 
already have ft , the relative frequency (the empirical measure of probability) of owning a 
house at various income levels as shown in Table 15.5, we can use it to obtain 7, from the 
normal CDF as shown in Table 15.10, or from Figure 15.5. 
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TABLE 15.10 

Estimating the Index 
/,■ from the Standard 
Normal CDF 


FIGURE 15.5 

Normal CDE 


EXAMPLE 15.6 

Illustration of 
Gprobit Using 
Housing 
Example 


Pi 

1, = F~\P.) 

0.20 

-0.8416 

0.24 

-0.7063 

0.30 

-0.5244 

0.35 

-0.3853 

0.45 

-0.1257 

0.51 

0.0251 

0.60 

0.2533 

0.66 

0.4125 

0.75 

0.6745 

0.80 

0.8416 


Notes: (1) P, are from Table 15.5; (2) I, are estimated from the standard normal 
CDF. 



Once we have the estimated estimating fi\ and /3 2 is relatively straightforward, as we 
show shortly. In passing, note that in the language of probit analysis the unobservable 
utility index /, is known as the normal equivalent deviate (n.e.d.) or simply normit. 
Since the n.e.d. or /, will be negative whenever P, < 0.5, in practice the number 5 is added 
to the n.e.d. and the result is called a probit. 


Let us continue with our housing example. We have already presented the results of the 
glogit model for this example. The grouped probit (gprobit) results of the same data are 
as follows: 

Using the n.e.d. (= /) given in Table 15.10, the regression results are as shown in 
Table 15.11. 31 The regression results based on the probits (= n.e.d. + 5) are as shown 
in Table 15.12. 

Except for the intercept term, these results are identical with those given in the 
previous table. But this should not be surprising. (Why?) 


31 The following results are not corrected for heteroscedasticity. See Exercise 15.12 for the appropriate 
procedure to correct heteroscedasticity. 
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EXAMPLE 15.6 

0 Continued ) 


TABLE 15.11 


Dependent 

Variable: J 




Variable 

Coefficient 

Std. Error t- 

-Statistic 

Probability 

c 

Income 

-1.0166 
#,04846 

0.0573 
0.00247 

-17.7473 

19.5585 

1.0397E-07 

4.8547E-08 


R 2 = 0.979S1 

QUJfbin-Watson statisti 

C = 0.9'. 384 

TABLE 15.12 

Dependent Variable: PrQtait 

Variable 

Coef |||§jient 

Std. Error t- 

-Statistic 

Probability 

C 

Income 

3.9833 

0.04846 

0.05728 

0.00247 

69.5336 

19.5585 

2.03737E-12 

4.8547E-08 


R 2 = 0.9795 

Durbin-Watsoa 

statistic 

= 0.913 8 


Note: These results are not corrected for heteroscedasticity (see Exercise 15.12). 


Interpretation of the Probit Estimates in Table 15.11 

How do we interpret the preceding results? Suppose we want to find out the effect of a unit 
change in X (income measured in thousands of dollars) on the probability that 7=1, 
that is, a family purchases a house. To do this, look at Eq. (15.9.2). We want to take the 
derivative of this function with respect to X (that is, the rate of change of the probability 
with respect to income). It turns out that this derivative is: 

= + (hXj)^ (15.9.5) 32 

where f(f\ + foXd is the standard normal probability density function evaluated at 
fi\ + @ 2 X 1 . As you will realize, this evaluation will depend on the particular value of the X 
variables. Let us take a value ofXfromTable 15.5, say, X = 6 (thousand dollars). Using the 
estimated values of the parameters given in Table 15.11, we thus want to find the normal 
density function at /[-1.0166 + 0.04846(6)] = /(-0.72548). If you refer to the normal 
distribution tables, you will find that for Z = —0.72548, the normal density is about 
0.3066. 33 Now multiplying this value by the estimated slope coefficient of 0.04846, we 
obtain 0.01485. This means that starting with an income level of $6,000, if the income goes 
up by $1,000, the probability of a family purchasing a house goes up by about 1.4 percent. 
(Compare this result with that given in Table 15.6.) 

As you can see from the preceding discussion, compared with the LPM and logit 
models, the computation of changes in probability using the probit model is a bit tedious. 

Instead of computing changes in probability, suppose you want to find the estimated 
probabilities from the fitted gprobit model. This can be done easily. Using the data in 

32 We use the chain rule of derivatives: 

dPj _ dF(t ) dt 
dXi ~ dt ' dX 

where t = ft 1 + P2 X-, ■ 

33 Note that the standard normal Z can range from —00 to +oo, but the density function f(Z) is 
always positive. 
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Table 15.11 and inserting the values of X from Table 15.5, the reader can check that the 
estimated n.i.d. values (to two digits) are as follows: 


X 6 8 10 13 15 20 25 30 35 40 

Estimated n.i.d. -0.72 -0.63 -0.53 -0.39 -0.29 -0.05 0.19 0.43 0.68 0.92 


Now statistical packages such as MINITAB can easily compute the (cumulative) probabil¬ 
ities associated with the various n.i.d.’s. For example, corresponding to an n.i.d. value 
—0.63, the estimated probability is 0.2647 and, corresponding to an n.i.d. value of 0.43, the 
estimated probability is 0.6691. If you compare these estimates with the actual values given 
in Table 15.5, you will find that the two are fairly close, suggesting that the fitted model 
is quite good. Graphically, what we have just done is already shown in Figure 15.4. 

The Probit Model for Ungrouped or Individual Data 

Let us revisit Table 15.7, which gives data on 32 individuals about their final grade in an 
intermediate microeconomics course in relation to the variables GPA, TUCE, and PSI. The 
results of the logit regression are given in Table 15.8. Let us see what the probit results look 
like. Notice that as in the case of the logit model for individual data, we will have to use a 
nonlinear estimating procedure based on the method of maximum likelihood. The regres¬ 
sion results calculated by EViews 6 are given in Table 15.13. 

“Qualitatively,” the results of the probit model are comparable with those obtained from 
the logit model in that GPA and PSI are individually statistically significant. Collectively, 
all the coefficients are statistically significant, since the value of the LR statistic is 15.5458 
with a p value of 0.0014. For reasons discussed in the next sections, we cannot directly 
compare the logit and probit regression coefficients. 

For comparative purposes, we present the results based on the linear probability model 
(LPM) for the grade data in Table 15.14. Again, qualitatively, the LPM results are similar 


TABLE 15.13 


TABLE 15.14 


Dependent Variable: grade 
Method: ML—Binary probit 
Convergent achieved af® 

:r 5 iterations 



Variable 

CoefiEjkcient 

Std. Error Z- 

-Statistic 

Probability 

C 

-7.4823 

2.5424 

2.9311 

6*®d3'3 

SPA 

1.6258 

0.6338 

2.3430 

0.0191 

TUCE 

0.0517 

0.0838 

■#, 6166 

0.5374 

PSI 

1.4263 

5950 

2.3 9 HI 

6.0165 


LR statist!# Pdf) = 15.5458 
Probability (LR stat) =6,0014 

McFadden R 2 

= 0.3774 


Dependent 

Variable: grade 



Variable 

Coefficient 

Std. Error t 

-Statistic 

Probability 

Jj 

-1.4980 

0.5238 

-2.8594 

0.0079 

SPA 

0.4638 

0.1619 

2.8640 

0.00*78 

TUCE 

0^,6104 

0.0194 

#.5386 

0.5943 

PSI 

0.3785 

0.1391 

2.72#0 

0 . Qll§ : - 

R 2 = 0.4159 Durbin- 

•Watson d= 2.3464 

F-statisti 

.0 = 6.6456 
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to the logit and probit models in that GPA and PSI are individually statistically significant 
hut TUCE is not. Also, together the explanatory variables have a significant impact on 
grade, as the lvalue of 6.6456 is statistically significant because its p value is only 0.0015. 

The Marginal Effect of a Unit Change in the Value 
of a Regressor in the Various Regression Models 

In the linear regression model, the slope coefficient measures the change in the average 
value of the regressand for a unit change in the value of a regressor, with all other variables 
held constant. 

In the LPM, the slope coefficient measures directly the change in the probability of an 
event occurring as the result of a unit change in the value of a regressor, with the effect of 
all other variables held constant. 

In the logit model the slope coefficient of a variable gives the change in the log of the odds 
associated with a unit change in that variable, again holding all other variables constant. But 
as noted previously, for the logit model the rate of change in the probability of an event hap¬ 
pening is given by Pj P, (1 — P,), where Pj is the (partial regression) coefficient of the /th re¬ 
gressor. But in evaluating P, , all the variables included in the analysis are involved. 

In the probit model, as we saw earlier, the rate of change in the probability is somewhat 
complicated and is given by p j f(Z i '), where /(Z,) is the density function of the standard 
normal variable and Z, — P\ + /3 2 X 2 , + ■ ■ • + PkXki , that is, the regression model used in 
the analysis. 

Thus, in both the logit and probit models all the regressors are involved in computing the 
changes in probability, whereas in the LPM only the /th regressor is involved. This difference 
may be one reason for the early popularity of the LPM model. Statistical packages, such as 
STATA, have made the task of finding the rate of change of probability for the logit and pro¬ 
bit models much easier. So now there is no need to choose LPM just because of its simplicity. 

15.10 Logit and Probit Models 

Although for our grade example LPM, logit, and probit give qualitatively similar results, 
we will confine our attention to logit and probit models because of the problems with the 
LPM noted earlier. Between logit and probit, which model is preferable? In most applica¬ 
tions the models are quite similar, the main difference being that the logistic distribution 
has slightly fatter tails, which can be seen from Figure 15.6. That is to say, the conditional 
probability P, approaches 0 or 1 at a slower rate in logit than in probit. This can be seen 
more clearly from Table 15.15. Therefore, there is no compelling reason to choose one over 
the other. In practice many researchers choose the logit model because of its comparative 
mathematical simplicity. 

Though the models are similar, one has to be careful in interpreting the coefficients 
estimated by the two models. For example, for our grade example, the coefficient of GPA 
of 1.6258 of the probit model (see Table 15.13) and 2.8261 of the logit model (see 
Table 15.8) are not directly comparable. The reason is that, although the standard logistic 
(the basis of logit) and the standard normal distributions (the basis of probit) both have a 
mean value of zero, their variances are different; 1 for the standard normal (as we already 
know) and 7r 2 /3 for the logistic distribution, where n ~ 22/7. Therefore, if you multiply 
the probit coefficient by about 1.81 (which is approximately = 7T/V3), you will get 
approximately the logit coefficient. For our example, the probit coefficient of GPA is 
1.6258. Multiplying this by 1.81, we obtain 2.94, which is close to the logit coefficient. 
Alternatively, if you multiply a logit coefficient by 0.55 (= 1/1.81), you will get the probit 
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FIGURE 15.6 

Logit and probit 

cumulative 

distributions. 


TABLE 15.15 

Values of Cumulative 
Probability Functions 



0 



coefficient. Amemiya, however, suggests multiplying a logit estimate by 0.625 to get a bet¬ 
ter estimate of the corresponding probit estimate. 34 Conversely, multiplying a probit coef¬ 
ficient by 1.6 (m 1 /0.625) gives the corresponding logit coefficient. 

Incidentally, Amemiya has also shown that the coefficients of LPM and logit models are 
related as follows: 


and 


/Slpm = 0.25/iiogit except for intercept 


fium = 0.25Aogit + 0.5 for intercept 

We leave it to the reader to find out if these approximations hold for our grade example. 

To conclude our discussion of LPM, logit, and probit models, we consider an extended 
example. 

34 T. Amemiya, "Qualitative Response Model: A Survey," journal of Economic Literature, vol. 19, 1981, 
pp. 481-536. 
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EXAMPLE 15.7 

To Smoke or 
Not to Smoke 


TABLE 15.16 


TABLE 15.17 


To find out what factors determine whether or not a person becomes a smoker, we ob¬ 
tained data on 1,196 individuals. 35 For each individual, there is information on education, 
age, income, and the price of cigarettes in 1979. The dependent variable is smoker, with 
1 -smokers and O-nonsmokers. Further analysis will be examined by Exercise 15.20 and the 
data can be found in Table 15.28 on the textbook website. For comparative purposes, 
we present the results based on LPM, logit, and probit models in a tabular form (see 
Table 15.16). These results have been obtained from STATA version 10. 


Variables 

LPM 

Logit 

Probit 

Constant 

1.1230 

2.7450 

1.7019 


(5.96) 

(3.31) 

(3.33) 

Age 

-0.0047 

-0.0208 

-0.0129 


(-5.70) 

(-5.58) 

(-5.66) 

Education 

-0.0206 

-0.0909 

-0.0562 


(-4.47) 

(-4.40) 

(-4.45) 

Income 

1.03e-0.6 

4.72e-06 

2.72e-06 


(0.63) 

(0.66) 

(0.62) 

Pcigs79 

-0.0051 

-0.0223 

-0.0137 


(-1.80) 

(-1.79) 

(-1.79) 

R 2 

0.0388 

0.0297 

0.0301 

Notes: Figures ii 

i the parentheses are t ratios f 

or LPM and z ratios for logit and pi 

•obit. For logit and 

probit, the R 2 val 

lues are pseudo R 2 values. 



Although the coefficients of the three models are not directly comparable, qualitatively 
they are similar. Thus, age, education, and price of cigarettes have a negative impact on 
smoking and income has positive impact. Statistically, the income effect is zero and the 
price effect is significant at about an 8 percent level. In Exercise 15.20, you are asked to 
apply the conversion factor to render the various coefficients comparable. 

In Table 15.1 7 we 

present the marginal effect of each variable 

on the probability of 

smoking for each model type. 



Variables 

LPM 

Logit 

Probit 

Age 

-0.0047 

-0.0048 

-0.0049 

Education 

-0.0206 

-0.0213 

-0.021 3 

Income 

1.03e-06 

1.11 e-06 

1,03e-06 

Pcigs79 

-0.0051 

-0.0052 

-0.0052 


Note: Except for income, the estimated coefficients are highly statistically significant for age and 
education, and significant at about the 8 percent level for the price of cigarettes. 


As you will recognize, the marginal effect of a variable on the probability of smoking for 
LPM is directly obtained from the estimated regression coefficients, but for the logit and 
probit models they have to be computed as discussed in the chapter. 

It is interesting that the marginal effects are quite similar for the three models. For 
example, if the level of education goes up, on average, the probability of someone 
becoming a smoker goes down by about 2 percent. 


35 These data are from Michael P. Murray, Econometrics: A Modern Introduction, Pearson/Addison- 
Wesley, Boston, 2006, and can be downloaded from www.aw-bc.com/murray. 
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15.11 The Tobit Model 


An extension of the probit model is the tobit model originally developed by James Tobin, 
the Nobel laureate economist. To explain this model, we continue with our home ownership 
example. In the probit model our concern was with estimating the probability of owning a 
house as a function of some socioeconomic variables. In the tobit model our interest is in 
finding out the amount of money a person or family spends on a house in relation to 
socioeconomic variables. Now we face a dilemma: If a consumer does not purchase a 
house, obviously we have no data on housing expenditure for such consumers; we have 
such data only on consumers who actually purchase a house. 

Thus consumers are divided into two groups, one consisting of, say, n\ consumers about 
whom we have information on the regressors (say, income, mortgage interest rate, number 
of people in the family, etc.) as well as the regressand (amount of expenditure on housing) 
and another consisting of « 2 consumers about whom we have information only on the 
regressors but not on the regressand. A sample in which information on the regressand is 
available only for some observations is known as a censored sample . 36 Therefore, the tobit 
model is also known as a censored regression model. Some authors call such models 
limited dependent variable regression models because of the restriction put on the values 
taken by the regressand. 

Statistically, we can express the tobit model as 


Y i =Pi+p 2 X i +u i 
= 0 


if RHS > 0 
otherwise 


(15.11.1) 


where RHS = right-hand side. Note: Additional X variables can be easily added to the 
model. 

Can we estimate regression (15.11.1) using only n \ observations and not worry about the 
remaining « 2 observations? The answer is no, for the OLS estimates of the parameters ob¬ 
tained from the subset of n\ observations will be biased as well as inconsistent; that is, they 
are biased even asymptotically. 37 

To see this, consider Figure 15.7. As the figure shows, if Y is not observed (because of 
censoring), all such observations (= « 2 ), denoted by crosses, will lie on the horizontal axis. 
If Y is observed, the observations (= n\), denoted by dots, will lie in the X—Y plane. It is 
intuitively clear that if we estimate a regression line based on the n\ observations only, the 
resulting intercept and slope coefficients are bound to be different than if all the («i + n 2 ) 
observations were taken into account. 

How then does one estimate tobit, or censored regression, models, such as Eq. (15.11.1)? 
The actual mechanics involves the method of maximum likelihood, which is rather involved 
and is beyond the scope of this book. But the reader can get more information about the ML 
method from the references. 38 

36 A censored sample should be distinguished from a truncated sample in which information on 
the regressors is available only if the regressand is observed. We will not pursue this topic here, but 
the interested reader may consult William H. Greene, Econometric Analysis, Prentice Hall, 4th ed., 
Englewood Cliffs, NJ, Chapter 19. For an intuitive discussion, see Peter Kennedy, A Guide to 
Econometrics, The MIT Press, Cambridge, Mass., 4th ed., 1998, Chapter 16. 

37 The bias arises from the fact that if we consider only the r?i observations and omit the others, there 
is no guarantee that E ( u, ) will be necessarily zero. And without E (t/,) = 0 we cannot guarantee that 
the OLS estimates will be unbiased. This bias can be readily seen from the discussion in Appendix 3A, 
Eqs. (4) and (5). 

38 See Greene, op. cit. A somewhat less technical discussion can be found in Richard Breen, Regression 
Models: Censored, Sample Selected or Truncated Data, Sage Publications, Newbury Park, California, 1996. 
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FIGURE 15.7 


Plot of amount of 
money consumer 
spends in buying a 
house versus income. 


Y 


x: Expenditure data not 
available, but income 
data available 
•: Both expenditure and 
income data available 



James Heckman has proposed an alternative to the ML method, which is comparatively 
simple. 39 This alternative consists of a two-step estimating procedure. In step 1, we first 
estimate the probability of a consumer owning a house, which is done on the basis of the 
probit model. In step 2, we estimate the model (15.11.1) by adding to it a variable (called 
the inverse Mills ratio or the hazard rate) that is derived from the probit estimate. For the 
actual mechanics, see the Heckman article. The Heckman procedure yields consistent 
estimates of the parameters of Eq. (15.11.1), but they are not as efficient as the ML 
estimates. Since most modem statistical software packages have the ML routine, it may be 
preferable to use these packages rather than the Heckman two-step procedure. 

Illustration of the Tobit Model: Ray Fair's Model 
of Extramarital Affairs 40 

In an interesting and theoretically innovative article, Ray Fair collected a sample of 601 
men and women then married for the first time and analyzed their responses to a question 
about extramarital affairs. 41 The variables used in this study are defined as follows: 

Y — number of affairs in the past year, 0, 1, 2, 3,4—10 (coded as 7) 

Z\ — 0 for female and 1 for male 
Z 2 = age 

Z3 = number of years married 

Z4 = children: 0 if no children and 1 if children 

Z5 = religiousness on a scale of 1 to 5, 1 being antireligion 

Zg = education, years: grade school = 9; high school = 12, Ph.D. or other = 20 

Z7 = occupation, “Hollingshead” scale, 1-7 

Zg = self-rating of marriage, 1 = very unhappy, 5 = very happy 

39 J. J. Heckman, "Sample Selection Bias as a Specification Error," Econometrica, vol. 47, pp. 153-161. 
40 Ray Fair, "A Theory of Extramarital Affairs," lournal of Political Economy, vol. 86, 1978, pp. 45-61. 

For the article and the data, see http://fairmodel.econ.yale.edu/rayfair/pdf/1978DAT.ZIP. 

41 In 1969 Psychology Today published a 101-question survey on sex and asked its readers to mail in 
their answers. In the July 1970 issue of the magazine the survey results were discussed on the basis of 
about 2,000 replies that were collected in electronic form. Ray Fair extracted the sample of 601 from 
these replies. 
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TABLE 15.18 

Explanatory Variable 

OLS Estimate 

Tobit Estimate 

Estimates of 

Intercept 

5.8720 (5.1622)* 

7.6084 (1.9479) f 

Extramarital Affairs 

Zi 

0.0540 (0.1 799) 

0.9457 (0.8898) 


z 2 

-0.0509 (-2.2536) 

-0.1926 (-2.3799) 


z 3 

0.1694 (4.1109) 

0.5331 (3.6368) 


Z 4 

-0.1426 (-0.4072) 

1.0191 (0.7965) 


z 5 

-0.4776 (-4.2747) 

-1.6990 (-4.1906) 


z 6 

-0.0137 (-0.2143) 

0.0253 (0.1113) 


z 7 

0.1049 (1.1803) 

0.2129 (0.6631) 


z 8 

-0.7118 (-5.9319) 

-2.2732 (-5.4724) 


R 2 

0.1317 

0.1515 


Note: In all there are 601 observations, of which 451 have zero values for the dep 
affairs) and 150 have nonzero values. 


ndent variable (n 


ital 


Of the 601 responses, 451 individuals had no extramarital affairs, and 150 individuals had 
one or more affairs. 

In terms of Figure 15.7, if we plot the number of afFairs on the vertical axis and, say, 
education on the horizontal axis, there will be 451 observations lying along the horizontal 
axis. Thus, we have a censored sample, and a tobit model may be appropriate. 

Table 15.18 gives estimates of the preceding model using both (the inappropriate) OLS 
and (the appropriate) ML procedures. As you can see, OLS includes 451 individuals who 
had no affairs and 150 who had one or more afFairs. The ML method takes this into account 
explicitly but the OLS method does not, thus the difference between the two estimates. 
For reasons already discussed, one should rely on the ML and not the OLS estimates. The 
coefficients in the two models can be interpreted like any other regression coefficients. 
The negative coefficient of Zg (marital happiness) means that the higher the marital happi¬ 
ness, the lower is the incidence of extramarital afFairs, perhaps an unsurprising finding. 

In passing, note that iF we are interested in the probability oF extramarital afFairs and not 
in the number oF such afFairs, we can use the probit model assigning Y — 0 For individuals 
who did not have any afFairs and Y = 1 for those who had such afFairs, giving the results 
shown in Table 15.19. With the knowledge oF probit modeling, readers should be able to 
interpret the probit results given in this table on their own. 

15.12 Modeling Count Data: The Poisson Regression Model 

There are many phenomena where the regressand is of the count type, such as the number 
of vacations taken by a family per year, the number of patents received by a firm per year, 
the number of visits to a dentist or a doctor per year, the number of visits to a grocery store 
per week, the number of parking or speeding tickets received per year, the number of days 
stayed in a hospital in a given period, the number of cars passing through a toll booth in a 
span of, say, 5 minutes, and so on. The underlying variable in each case is discrete, taking 
only a finite number of values. Sometimes count data can also refer to rare, or infrequent, 
occurrences, such as getting hit by lightning in a span of a week, winning more than one lot¬ 
tery within 2 weeks, or having two or more heart attacks in a span of 4 weeks. How do we 
model such phenomena? 
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TABLE 15.19 


Dependent Variable; YSTAR 




Method: ML—Binary probit 




Sample: 1-601 




Included observations: 601 




Convergence achieved Kjifcer 5 

iterations 



Variable Coefficient 

Std. Error 

Z Statistic 

Probability 

C 0.779402 

0.512549 

1.52:0638 

0.1284 

Z x 0.173457 

§.137991 

1.237015 

0.2087 

Z 2 -0•084514 

0.010418 

-2.350844 

0.0183 

Z 3 0.054343 

0.018809 

2.889278 

0.0039 

Z 4 0.216644 

0.165168 

1.311657 

0.1896 

Z 5 -0.185468 

Mill 62 6 

-3.592551 

0.0003 

Z 6 0.011262 

§.029517 

0.381556 

0.7028 

Z 7 0.013669 

1.041404 

0.330129 

0.7413 

Z s -0;. 271791 

0.053475 

-5.082608 

0.0000 

Mean dependent var. 

0.249584 

S.D. dependent var. 

0.433133 

S.E. of regression. 

f). 410279 

Akaike ini© criterion 

1.045584 

Sum squared resid. 

99.65088 

Schwarz criterion. 

1.111453 

Lpg likelihood 

-305.1980 

Haaaan-Quinn csriter. 

1.071224 

Restr. log likelihood 

-337.6885 

Avg. log likelihood 

0.507817 

IiR statistic (8 df) 

64.98107 

McFadden R -squared 

0.096215 j 

Probability (LR BisSf 

4.87E-11 



Obs . with Dep = 0 

451 

Total obs. 

601 

Obs. with Dep = 1 

150 




Just as the Bernoulli distribution was chosen to model the yes/no decision in the linear 
probability model, the probability distribution that is specifically suited for count data is the 
Poisson probability distribution. The pdf of the Poisson distribution is given by: 42 

f(Jd = ^-yr Y = 0,1,2,... (15.12.1) 

where f(Y) denotes the probability that the variable Y takes non-negative integer values, 
and where Y] (read Y factorial) stands for 7! = 7 x (7 — 1) x (7 — 2) x 2 x 1. ft can be 
proved that 


£(7) = /z (15.12.2) 

var(7) = [j, (15.12.3) 

Notice an interesting feature of the Poisson distribution: Its variance is the same as its 
mean value. 

The Poisson regression model may be written as: 

Y t = E(Ji) + u t = ^ + Ui (15.12.4) 


42 See any standard book on statistics for the details of this distribution. 









578 Part Three Topics in Econometrics 


EXAMPLE 15.8 

An Illustrative 
Example: 
Geriatric Study 
of Frequency of 
Falls 


TABLE 15.20 


where the 7’s are independently distributed as Poisson random variables with mean p, for 
each individual expressed as 

Pi = E(Yi) = fr+ p 2 X 2i + p 3 X 3l + • • • + p k X ki (15.12.5) 

where the X’s are some of the variables that might affect the mean value. For example, if 
our count variable is the number of visits to the Metropolitan Museum of Art in New York 
in a given year, this number will depend on variables such as income of the consumer, 
admission price, distance from the museum, and parking fees. 

For estimation purposes, we write the model as: 

Yi = ^yr + Ui (15.12.6) 

with p replaced by Eq. (5.12.5). As you can readily see, the resulting regression model will 
be nonlinear in the parameters, necessitating nonlinear regression estimation discussed in 
the previous chapter. Let us consider a concrete example to see how all this works out. 


The data used here were collected by Neter et al. 43 The data relate to 100 individuals 
65 years of age and older. The objective of the study was to record the number of falls 
(=7) suffered by these individuals in relation to gender ( X 2 = 0 female and 1 for male), 
a balance index (X 3 ), and a strength index (X4). The higher the balance index, the more 
stable is the subject, and the higher the strength index, the stronger is the subject. To find 
out if education or education plus aerobic exercise has any effect on the number of falls, 
the authors introduced an additional variable (Xi), called the intervention variable, such 
that Xi = 0 if only education and Xi = 1 if education plus aerobic exercise training. The 
subjects were randomly assigned to the two intervention methods. 

Using EViews 6, we obtained the output in Table 15.20. 


Dependent Variable: Y 
Sample: 1-100 

Convergence achieved after 7 iterations 
y=E3tt>(C(qi+C(l) *X1+C{2) *X2+C(3) *X3+C{4) *X4) 


C(0) 

cm 

0(2) 

0(3) 

cm 


Coefficient Std. Error t-Statistic Probability 


0.37020 
-1.10036 
-0.02194 
0.01066 
0.00927 


0.3459 
0.1705 
0.11&5 

0. Mttafc . 

0.00414 


1.0701 

-6.4525 

-0.1985 

3.9483 

2.2380 


0.2873 
0.Q00| 
0.843 ft 1 
0.0001 
0.0271 


P? = 0.4857 Adjusted R ? = 0.4 640 

Log likelihood =-197.2096 Durbin-Watson statiaftiffi»A .7358 


Note: EXP( ) means e (the base of natural logarithm) raised by the expression in ( ). 


43 John Neter, Michael H. Kutner, Christopher J. Nachtsheim, and William Wasserman, Applied 
Regression Models, Irwin, 3d ed., Chicago, 1996. The data were obtained from the data disk included 
in the book and refer to Exercise 14.28. 
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EXAMPLE 15.8 Interpretation of Results. Keep in mind that what we have obtained in Table 15.20 is 
(, Continued ) the estimated mean value for the /th individual, ; that is, what we have estimated is: 

_ e 0.3702—1.1 00366 Xk— 0.02194X 2 i+0.0106X 3 , +0.00927X 4/ (15 12 7) 

To find the actual mean value for the /th subject, we need to put the values of the various 
X variables for that subject. For example, subject 99 had these values: Y = 4, Xi =0, 
X2 = 1, X3 = 50, and X4 = 56. Putting these values in Eq. (15.12.7), we obtain /i.99 = 
3.3538 as the estimated mean value for the 99th subject. The actual Y value for this 
individual was 4. 

Now if we want to find out the probability that a subject similar to subject 99 has less 
than 5 falls per year, we can obtain it as follows: 

P(Y < 5) = P(Y =0)+ P(Y = 1) + P(Y = 2) + P(Y = 3) + P(Y = 4) 

( 3 . 3538 )°e - 33538 ( 3.35 38) 1 e “ 3 3538 ( 3.35 3 8 )V 3 - 3538 

0! + 1! + 2! 

( 3 . 3538 ) 3 e - 3 3538 ( 3 . 3538) 4 e - 3 3538 

+ 3! + 4! 

= 0.7491 

We can also find out the marginal, or partial, effect of a regressor on the mean value of 
Y as follows. In terms of our illustrative example, suppose we want to find out the effect of 
a unit increase in the strength index (X4) on mean Y. Since 

^ — gCo+Cl Xl/+C2X2/+C3X3,+C4X4/ (15.12.8) 

we want to find dpt/dX*. Using the chain rule of calculus, it can be easily shown that this 
is equal to 

= C 4 e c ° +c ’ Xi/+ c 2X 2 / +c 3 X3;+C4X4, _ CaI1 (15.12.9) 

8X4 

That is, the rate of change of the mean value with respect to a regressor is equal to the 
coefficient of that regressor times the mean value. Of course, the mean value pL will 
depend on the values taken by all the regressors in the model. This is similar to the logit 
and probit models we discussed earlier, where the marginal contribution of a variable also 
depended on the values taken by all the variables in the model. 

Returning to the statistical significance of the individual coefficients, we see that the 
intercept and variable X2 are individually statistically insignificant. But note that the stan¬ 
dard errors given in the table are asymptotic and hence the t values are to be interpreted 
asymptotically. As noted previously, generally the results of all nonlinear iterative estimat¬ 
ing procedures have validity in large samples only. 

In concluding our discussion of the Poisson regression model, it may be noted that the 
model makes restrictive assumptions in that the mean and the variance of the Poisson 
process are the same and that the probability of an occurrence is constant at any point 
in time. 


15.13 Further Topics in Qualitative Response Regression Models 

As noted at the outset, the topic of qualitative response regression models is vast. What we 
have presented in this chapter are some of the basic models in this area. For those who want 
to pursue this topic further, we discuss below very briefly some other models in this area. 
We will not pursue them here, for that would take us far away from the scope of this book. 
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Ordinal Logit and Probit Models 

In the bivariate logit and probit models we were interested in modeling a yes or no re¬ 
sponse variable. But often the response variable, or regressand, can have more than two 
outcomes and very often these outcomes are ordinal in nature; that is, they cannot be ex¬ 
pressed on an interval scale. Frequently, in survey-type research the responses are on a 
Likert-type scale, such as “strongly agree,” “somewhat agree,” or “strongly disagree.” Or 
the responses in an educational survey may be “less than high school,” “high school,” 
“college,” or “professional degrees.” Very often these responses are coded as 0 (less than 
high school), 1 (high school), 2 (college), 3 (postgraduate). These are ordinal scales in that 
there is clear ranking among the categories but we cannot say that 2 (college education) is 
twice 1 (high school education) or 3 (postgraduate education) is three times 1 (high school 
education). 

To study phenomena such as the preceding, one can extend the bivariate logit and probit 
models to take into account multiple ranked categories. The arithmetic gets quite involved 
as we have to use multistage normal and logistic probability distributions to allow for the 
various ranked categories. For the underlying mathematics and some of the applications, 
the reader may consult the Greene and Maddala texts cited earlier. At a comparatively 
intuitive level, the reader may consult the Liao monograph. 44 Software packages such as 
LIMDEP, EViews, STATA, and SHAZAM have routines to estimate ordered logit and 
probit models. 

Multinomial Logit and Probit Models 

In the ordered probit and logit models the response variable has more than two ordered, or 
ranked, categories. But there are situations where the regressand is unordered. Take, for 
example, the choice of transportation mode to work. The choices may be bicycle, motor¬ 
bike, car, bus, or train. Although these are categorical responses, there is no ranking or 
order here; they are essentially nominal in character. For another example, consider occu¬ 
pational classifications, such as unskilled, semiskilled, and highly skilled. Again, there is no 
order here. Similarly, occupational choices such as self-employed, working for a private 
firm, working for a local government, and working for the federal government are essen¬ 
tially nominal in character. 

The techniques of multinomial logit or probit models can be employed to study such 
nominal categories. Again, the mathematics gets a little involved. The references cited pre¬ 
viously will give the essentials of these techniques. And the statistical packages cited earlier 
can be used to implement such models, if their use is required in specific cases. 

Duration Models 

Consider questions such as these: (1) What determines the duration of unemployment 
spells? (2) What determines the life of a light bulb? (3) What factors determine the dura¬ 
tion of a strike? (4) What determines the survival time of an HIV-positive patient? 

Subjects such as these are the topic of duration models, popularly known as survival 
analysis or time-to-event data analysis. In each of the examples cited above, the key 
variable is the length of time or spell length, which is modeled as a random variable. Again 
the mathematics involves the CDFs and PDFs of appropriate probability distributions. 
Although the technical details can be tedious, there are accessible books on this subject 45 

44 Tim Futing Liao, op. cit. 

45 See, for example, David W. Hosmer, Jr., and Stanley Lemeshow, Applied Survival Analysis, John Wiley 
& Sons, New York, 1999. 


Chapter 15 Qualitative Response Regression Models 581 


Statistical packages such as STATA and LIMDEP can easily estimate such duration 
models. These packages have worked examples to aid the researcher in the use of such 
models. 


Summary and 
Conclusions 


1. Qualitative response regression models refer to models in which the response, or re- 
gressand, variable is not quantitative or an interval scale. 

2. The simplest possible qualitative response regression model is the binary model in which 
the regressand is of the yes/no or presence/absence type. 

3. The simplest possible binary regression model is the linear probability model (LPM) 
in which the binary response variable is regressed on the relevant explanatory vari¬ 
ables by using the standard OLS methodology. Simplicity may not be a virtue here, for 
the LPM suffers from several estimation problems. Even if some of the estimation 
problems can be overcome, the fundamental weakness of the LPM is that it assumes 
that the probability of something happening increases linearly with the level of the re¬ 
gressor. This very restrictive assumption can be avoided if we use the logit and probit 
models. 

4. In the logit model the dependent variable is the log of the odds ratio, which is a linear 
function of the regressors. The probability function that underlies the logit model is the 
logistic distribution. If the data are available in grouped form, we can use OLS to 
estimate the parameters of the logit model, provided we take into account explicitly the 
heteroscedastic nature of the error term. If the data are available at the individual, or 
micro, level, nonlinear-in-the-parameter estimating procedures are called for. 

5. If we choose the normal distribution as the appropriate probability distribution, then 
we can use the probit model. This model is mathematically a bit difficult as it involves 
integrals. But for all practical purposes, both logit and probit models give similar 
results. In practice, the choice therefore depends on the ease of computation, which 
is not a serious problem with sophisticated statistical packages that are now readily 
available. 

6. If the response variable is of the count type, the model that is most frequently used in ap¬ 
plied work is the Poisson regression model, which is based on the Poisson probability 
distribution. 

7. A model that is closely related to the probit model is the tobit model, also known as a 
censored regression model. In this model, the response variable is observed only if a 
certain condition(s) is met. Thus, the question of how much one spends on a car is 
meaningful only if one decides to buy a car to begin with. However, Maddala notes that 
the tobit model is “applicable only in those cases where the latent variable [i.e., the 
basic variable underlying a phenomenon] can, in principle, take negative values and the 
observed zero values are a consequence of censoring and nonobservability.” 46 

8. There are various extensions of the binary response regression models. These include 
ordered probit and logit and nominal prohit and logit models. The philosophy underly¬ 
ing these models is the same as the simpler logit and probit models, although the math¬ 
ematics gets rather complicated. 

9. Finally, we considered briefly the so-called duration models in which the duration 
of a phenomenon, such as unemployment or sickness, depends on several factors. In such 
models, the length, or the spell of duration, becomes the variable of research interest. 


6 G. S. Maddala, Introduction to Econometrics, 2d ed., Macmillan, New York, 1992, p. 342. 
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EXERCISES 


Questions 

15.1. Refer to the data given in Table 15.2. If Y, is negative, assume it to be equal to 0.01 
and if it is greater than 1, assume it to be equal to 0.99. Recalculate the weights w, 
and estimate the LPM using WLS. Compare your results with those given in 
Eq. (15.2.11) and comment. 

15.2. For the home ownership data given in Table 15.1, the maximum likelihood 
estimates of the logit model are as follows: 

L t = In = -493.54 + 32.96 income 

' t = (—0.000008X0.000008) 

Comment on these results, bearing in mind that all values of income above 16 (thou¬ 
sand dollars) correspond to Y — 1 and all values of income below 16 correspond to 
Y — 0. A priori, what would you expect in such a situation? 

15.3. In studying the purchase of durable goods Y (Y — 1 if purchased, Y = 0 if no 
purchase) as a function of several variables for a total of 762 households, Janet A. 
Fisher*obtained the following LPM results: 


Explanatory Variable 

Constant 

1957 disposable income, Xi 
(Disposable income = Xi) 2 , X2 
Checking accounts, X 3 
Savings accounts, X 4 
U.S. savings bonds, X5 
Housing status: rent, X 6 
Housing status: own, X7 
Monthly rent, Xs 
Monthly mortgage payments, X9 
Personal noninstallment debt, X 10 


Coefficient Standard Error 
0.1411 — 

0.0251 0.0118 

-0.0004 0.0004 

-0.0051 0.0108 

0.0013 0.0047 

-0.0079 0.0067 

-0.0469 0.0937 

0.0712 
1.0983 
0.5162 
0.0326 
0.0084 
0.0001 
0.0501 
0.0358 
0.0072 
0.0384 


0.0136 
-0.7540 
-0.9809 
-0.0367 

Age, Xn 0.0046 

Age squared, X 12 -0.0001 

Marital status, Xi 3 (1 = married) 0.1760 

Number of children, X14 0.0398 

(Number of children = X14) 2 , X15 -0.0036 

Purchase plans, Xi 6 (1 = planned; 0 otherwise) 0.1 760 
R 2 = 0.1336 


Notes: All financial variables are in thousands of dollars. 

Housing status: Rent (1 if rents; 0 otherwise). 

Housing status: Own (1 if owns; 0 otherwise). 

no. 1, Table 1, 1962, p. 67. 

a. Comment generally on the fit of the equation. 

b. How would you interpret the coefficient of—0.0051 attached to the checking ac¬ 
counts variable? How would you rationalize the negative sign for this variable? 

c. What is the rationale behind introducing the age-squared and number of children- 
squared variables? Why is the sign negative in both cases? 


*"An Analysis of Consumer Goods Expenditure," The Review of Economics and Statistics, vol. 64, no. 1, 
1962, pp. 64-71. 
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d. Assuming values of zero for all but the income variable, find out the conditional 
probability of a household whose income is $20,000 purchasing a durable good. 

e. Estimate the conditional probability of owning durable good(s), given: X\ — 
$15,000, X 3 = $3,000, X 4 = $5,000, X 6 = 0, X 7 =l, X s = $500, W 9 = $300, 
X 10 = 0,Xn = 35,X 13 = 1,X 14 = 2,X 16 = 0. 

15.4. The R 2 value in the labor-force participation regression given in Table 15.3 is 0.175, 
which is rather low. Can you test this value for statistical significance? Which test do 
you use and why? Comment in general on the value of R 2 in such models. 

15.5. Estimate the probabilities of owning a house at the various income levels underly¬ 
ing the regression (15.7.1). Plot them against income and comment on the resulting 
relationship. 

*15.6. In the probit regression given in Table 15.11 show that the intercept is equal to 
—dx/dx and the slope is equal to 1/er*, where /i x and a x are the mean and standard 
deviation ofX. 

15.7. From data for 54 standard metropolitan statistical areas (SMSA), Demaris estimated 
the following logit model to explain high murder rate versus low murder rate:** 

\nOi= 1.1387+ 0.00147) + 0.0561C, - 0.40507?, 
se = (0.0009) (0.0227) (0.1568) 

where O — the odds of a high murder rate, P — 1980 population size in thousands, 
C = population growth rate from 1970 to 1980, R — reading quotient, and the se are 
the asymptotic standard errors. 

a. How would you interpret the various coefficients? 

b. Which of the coefficients are individually statistically significant? 

c. What is the effect of a unit increase in the reading quotient on the odds of hav¬ 
ing a higher murder rate? 

d. What is the effect of a percentage point increase in the population growth rate on 
the odds of having a higher murder rate? 

15.8. Compare and comment on the OLS and WLS regressions in Eqs. (15.7.3) and 
(15.7.1). 

Empirical Exercises 

15.9. From the household budget survey of 1980 of the Dutch Central Bureau of Statis¬ 
tics, J. S. Cramer obtained the following logit model based on a sample of 2,820 
households. (The results given here are based on the method of maximum likeli¬ 
hood and are after the third iteration.) 1 ' The purpose of the logit model was to 
determine car ownership as a function of (logarithm of) income. Car ownership was 
a binary variable: Y = 1 if a household owns a car, zero otherwise. 

Li= -2.77231 + 0.347582 In Income 
? = (-3.35) (4.05) 

X 2 (l df) = 16.681 (p value = 0.0000) 

where L, = estimated logit and where In Income is the logarithm of income. The x 2 
measures the goodness of fit of the model. 

‘Optional. 

“Demaris, op. cit., p. 46. 

S. Cramer, An Introduction to the Logit Model for Economist, 2d ed., published and distributed by 
Timberlake Consultants Ltd., 2001, p. 33. These results are reproduced from the statistical package 
PC-GIVE 10 published by Timberlake Consultants, p. 51. 
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a. Interpret the estimated logit model. 

b. From the estimated logit model, how would you obtain the expression for the 
probability of car ownership? 

c. What is the probability that a household with an income of $20,000 will own a 
car? And at an income level of $25,000? What is the rate of change of probabil¬ 
ity at the income level of $20,000? 

d. Comment on the statistical significance of the estimated logit model. 

15.10. Establish Eq. (15.2.8). 

15.11. In an important study of college graduation rates of all high school matriculants 
and Black-only matriculants, Bowen and Bok obtained the results in Table 15.21, 
based on the logit model.* 


TABLE 15.21 Logistic Regression Model Predicting Graduation Rates, 1989 Entering Cohort 


Variable 

Intercept 

Female 

Black 

Hispanic 

Asian 

Other race 
SAT > 1,299 
SAT 1,200-1,299 
SAT 1,100-1,199 
SAT 1,000-1,099 
SAT not available 
Top 10% of high 
school class 
High school class rank 
not available 
High socioeconomic 
status (SES) 

Low SES 

SES not available 

SEL-1 

SEL-2 

Women's college 
Number of observations 
-2 log likelihood 
Restricted 
Unrestricted 
Chi square 


All Matriculants 


Black Only 


Parameter Standard Odds 
Estimate Error Ratio 


Parameter Standard 
Estimate Error 


0.957 0.052 — 

0.280 0.031 1.323 

-0.513 0.056 0.599 

-0.350 0.080 0.705 

0.122 0.055 1.130 

-0.330 0.104 0.719 

0.331 0.059 1.393 

0.253 0.055 1.288 

0.350 0.053 1.420 

0.192 0.054 1.211 

-0.330 0.127 0.719 


0.455 0.112 

0.265 0.101 


0.128 0.248 

0.232 0.179 

0.308 0.149 

0.141 0.136 

0.048 0.349 


0.342 0.036 1.407 


0.315 0.117 


-0.065 0.046 0.937 


-0.065 0.148 


0.283 0.036 1.327 


0.557 0.175 


-0.385 0.079 0.680 

0.110 0.050 1.116 


-0.305 0.143 

0.031 0.172 


1.092 0.058 2.979 

0.193 0.036 1.212 


0.712 0.161 

0.280 0.119 


-0.299 0.069 0.742 


0.158 0.269 


32,524 


2,354 


31,553 2,667 

30,160 2,569 

1,393 with 18 d.f. 98with14d.f. 


Odds 

Ratio 


1.303 


1.137 

1.261 

1.361 

1.151 

1.050 

1.370 

0.937 

1.746 

0.737 

1.031 

2.038 

1.323 

1.171 


Notes: Bold coefficients are significant at the .05 level; other coefficients are not. The omitted categories in the model are White, male, SAT < 1,000, bottom 90% of high 
school class, middle SES, SEL-3, coed institution. Graduation rates are 6-year, first-school graduation rates, as defined in the notes to Appendix Table D.3.1. Institutional 
selectivity categories are as defined in the notes to Appendix Table D.3.1. See Appendix B for definition of socioeconomic status (SES). 

SEL-1 = institutions with mean combined SAT scores of 1,300 and above. 

SEL-2 = institutions with mean combined SAT scores between 1,150 and 1,299. 

SEL-3 = institutions with mean combined SAT scores below 1,150. 

Source: Bowen and Bok, op. cit., p. 381. 


*William C. Bowen and Derek Bok, The Shape of the River: Long Term Consequences of Considering Race 
in College and University Admissions, Princeton University Press, Princeton, Nj, 1998, p. 381. 
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a. What general conclusion do you draw about graduation rates of all matriculants 
and black-only matriculants? 

b. The odds ratio is the ratio of two odds. Compare two groups of all matriculants, 
one with a SAT score of greater than 1,299 and the other with a SAT score of less 
than 1,000 (the base category). The odds ratio of 1.393 means the odds of ma¬ 
triculants in the first category graduating from college are 39 percent higher than 
those in the latter category. Do the various odds ratios shown in the table accord 
with a priori expectations? 

c. What can you say about the statistical significance of the estimated parameters? 
What about the overall significance of the estimated model? 

15.12. In the probit model given in Table 15.11 the disturbance u t has this variance: 

Jjg W-fl) 

Ndf 

where f is the standard normal density function evaluated at F~ x {Pi). 

a. Given the preceding variance of u„ how would you transform the model in 
Table 15.10 to make the resulting error term homoscedastic? 

b. Use the data in Table 15.10 to show the transformed data. 

c. Estimate the probit model based on the transformed data and compare the results 
with those based on the original data. 

15.13. Since R 2 as a measure of goodness of fit is not particularly well suited for the 
dichotomous dependent variable models, one suggested alternative is the x 2 test 
described below: 

2 ^ Ni(Pi - P*) 2 

x h p ?v- p n 

where A) = number of observations in the rth cell 

Pi — actual probability of the event occurring ( = nJNi) 

P* = estimated probability 

G = number of cells (i.e., the number of levels at which X t is measured, e.g., 
10 in Table 15.4) 

It can be shown that, for large samples, x 2 is distributed according to the x 2 distri¬ 
bution with (G — k) df, where k is the number of parameters in the estimating 
model (k < G). 

Apply the preceding x 2 test to regression (15.7.1) and comment on the resulting 
goodness of fit and compare it with the reported R 2 value. 

15.14. Table 15.22 gives data on the results of spraying rotenone of different concentra¬ 
tions on the chrysanthemum aphis in batches of approximately fifty. Develop a suit¬ 
able model to express the probability of death as a function of the log of X, the log 
of dosage, and comment on the results. Also compute the x 2 test of fit discussed in 
Exercise 15.13. 

15.15. Thirteen applicants to a graduate program had quantitative and verbal scores on the 
GRE as listed in Table 15.23. Six students were admitted to the program. 

a. Use the LPM to predict the probability of admission to the program based on 
quantitative and verbal scores in the GRE. 

b. Is this a satisfactory model? If not, what altemative(s) do you suggest? 
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TABLE 15.22 
Toxicity Study and 
Rotenone on 
Chrysanthemum 
Aphis 

Source: D. J. Fernet, Probit 
Analysis, Cambridge University 


TABLE 15.23 

GRE Scores 

Source: Donald F. Morrison, 
Applied Linear Statistical 
Methods, Prentice-Hall, Inc., 
Englewood Cliffs, NJ, 1983, 
p. 279 (adapted). 


Concentration, 
Milligrams per Liter 

X log (X) 

Total, 

N, 

Death, 

n; 

Pi= rti/N, 

2.6 

0.4150 

50 

6 

0.120 

3.8 

0.5797 

48 

16 

0.333 

5.1 

0.7076 

46 

24 

0.522 

7.7 

0.8865 

49 

42 

0.857 

10.2 

1.0086 

50 

44 

0.880 



GRE Aptitude Test Scores 

Admitted to 
Graduate Program 

Student Number 

Quantitative, Q 

Verbal, V 

(Yes = 1, No = 0) 

1 

760 

550 

1 

2 

600 

350 

0 

3 

720 

320 

0 

4 

710 

630 

1 

5 

530 

430 

0 

6 

650 

570 

0 

7 

800 

500 

1 

8 

650 

680 

1 

9 

520 

660 

0 

10 

800 

250 

0 

11 

670 

480 

0 

12 

670 

520 

1 

13 

780 

710 

1 


15.16. To study the effectiveness of a price discount coupon on a six-pack of a soft drink, 
Douglas Montgomery and Elizabeth Peck collected the data shown in Table 15.24. 
A sample of 5,500 consumers was randomly assigned to the eleven discount cate¬ 
gories shown in the table, 500 per category. The response variable is whether or not 
consumers redeemed the coupon within one month. 

a. See if the logit model fits the data, treating the redemption rate as the dependent 
variable and price discount as the explanatory variable. 

b. See if the probit model does as well as the logit model. 


TABLE 15.24 
Price of Soda with 
Discount Coupon 

A. Peck, Introduction to 

Analysis, John Wiley & 
Sons, New York, 1982, 
p. 243 (notation changed). 


Price Discount Sample Size 
X, <t N, 

5 500 

7 500 

9 500 

11 500 

13 500 

15 500 

17 500 

19 500 

21 500 

23 500 

25 500 


Number of Coupons Redeemed 

100 

122 

147 

176 

211 

244 

277 

310 

343 

372 

391 
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c. What is the predicted redemption rate if the price discount was 17 cents? 

d. Estimate the price discount for which 70 percent of the coupons will be 
redeemed. 

15.17. To find out who has a bank account (checking, savings, etc.) and who doesn’t, John 
Caskey and Andrew Peterson estimated a probit model for the years 1977 and 1989, 
using data on U.S. households. The results are given in Table 15.25. The values of 
the slope coefficients given in the table measure the implied effect of a unit change 
in a regressor on the probability that a household has a bank account, these 
marginal effects being calculated at the mean values of the regressors included in 
the model. 

a. For 1977, what is the effect of marital status on ownership of a bank account? 
And for 1989? Do these results make economic sense? 

b. Why is the coefficient for the minority variable negative for both 1977 and 1989? 

c. How can you rationalize the negative sign for the number of children variable? 

d. What does the chi-square statistic given in the table suggest? (Hint: See Exer¬ 
cise 15.13.) 

TA B LE 15.25 Probit Regressions Where Dependent Variable Is Ownership of a Deposit Account 


Constant 

Income (thousands 1991 $) 
Married 

Number of children 

Age of head of household (HH) 

Education of HH 

Male HH 

Minority 

Employed 

Homeowner 

Log likelihood 
Chi-square statistic 
(H 0 : All coefficients except 
constant equal zero) 

Number of observations 
Percentage in sample 
with correct predictions 


1977 Data 


Coefficients Implied Slope 

-1.06 

(3.3) * 

0.030 0.002 

(6.9) 

0.127 0.008 

( 0 - 8 ) 

-0.131 -0.009 

(3.6) 

0.006 0.0004 

(1.7) 

0.121 0.008 

(7.4) 

-0.078 -0.005 

(0.5) 

-0.750 -0.050 

( 6 . 8 ) 

0.186 0.012 

( 1 . 6 ) 

0.520 0.035 

(4.7) 

-430.7 

408 


2,025 

91 


1989 Data 


Coefficients Implied Slope 

-2.20 

( 6 . 8 )* 

0.025 0.002 

( 6 . 8 ) 

0.235 0.023 

(1.7) 

-0.084 -0.008 

( 2 . 0 ) 

0.021 0.002 

(6.3) 

0.128 0.012 

(7.7) 

-0.144 -0.011 

(0.9) 

-0.600 -0.058 

(6.5) 

0.402 0.039 

(3.6) 

0.522 0.051 

(5.3) 

-526.0 

602 


2,091 

90 


♦Numbers in parentheses are t statistics. 

Source: John P. Caskey and Andrew Peterson, “Who Has a Bank Account a 
City, October 1993. 


and Who Doesn’t: 1977 and 1989,” Research Working Paper 93-10, Federal Re 


i Bank of Kan 
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15.18. Monte Carlo study. As an aid to understanding the probit model, William Becker 
and Donald Waldman assumed the following:* 

E(Y \X) = — 1 + 3X 

Then, letting 7, = — 1 + 3X + et , where e, is assumed standard normal (i.e., zero 
mean and unit variance), they generated a sample of 35 observations as shown in 
Table 15.26. 

a. From the data on Y and X given in this table, can you estimate an LPM? 
Remember that the true E{Y \X) — — 1+ 3X. 

b. Given X = 0.48, estimate E( Y \ X = 0.48) and compare it with the true 
E(Y | X = 0.48). Note X = 0.48. 

c. Using the data on 7* and X given in Table 15.26, estimate a probit model. You 
may use any statistical package you want. The authors’ estimated probit model is 
the following: 

Y* = -0.969 + 2.764X; 

Find out the P{Y* = 1 | X = 0.48), that is, P(Y 1 > 0 | X = 0.48). See if your 
answer agrees with the authors’ answer of 0.64. 

d. The sample standard deviation of the X values given in Table 15.26 is 0.31. What 
is the predicted change in probability if X is one standard deviation above the 
mean value, that is, what is P( Y* = 1 | X — 0.79)? The authors’ answer is 0.25. 


TABLE 15.26 
Hypothetical Data 
Set Generated by 
the Model 7 = — 1 + 
3X + e and 7* = 1 
If 7>0 

Source: William E. Becker and 
Donald M. Waldman, “A 
Graphical Interpretation 
of Probit Coefficients,” Journal 

1989, Table 1, p. 373. 


Y Y* X 

-0.3786 0 0.29 

1.1974 1 0.59 

-0.4648 0 0.14 

1.1400 1 0.81 

0.3188 1 0.35 

2.2013 1 1.00 

2.4473 1 0.80 

0.1153 1 0.40 

0.4110 1 0.07 

2.6950 1 0.87 

2.2009 1 0.98 

0.6389 1 0.28 

4.3192 1 0.99 

-1.9906 0 0.04 

-0.9021 0 0.37 

0.9433 1 0.94 

-3.2235 0 0.04 

0.1690 1 0.07 


Y Y* X 

-0.3753 0 0.56 

1.9701 1 0.61 

-0.4054 0 0.17 

2.4416 1 0.89 

0.8150 1 0.65 

-0.1223 0 0.23 

0.1428 1 0.26 

-0.6681 0 0.64 

1.8286 1 0.67 

-0.6459 0 0.26 

2.9784 1 0.63 

-2.3326 0 0.09 

0.8056 1 0.54 

-0.8983 0 0.74 

-0.2355 0 0.17 

1.1429 1 0.57 

-0.2965 0 0.18 


*William E. Becker and Donald M. Waldman, "A Graphical Interpretation of Probit Coefficients," 
Journal of Economic Education, vol. 20, no. 4, Fall 1989, pp. 371-378. 
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15.19. Table 15.27 on the textbook website gives data for 2,000 women regarding work 
(1 = a woman works, 0 = otherwise), age, marital status (1 = married, 0 = other¬ 
wise), number of children, and education (number of years of schooling). Out of a 
total of 2,000 women, 657 were recorded as not being wage earners. 

a. Using these data, estimate the linear probability model (LPM). 

b. Using the same data, estimate a logit model and obtain the marginal effects of 
the various variables. 

c. Repeat ( b ) for the probit model. 

d. Which model would you choose? Why? 

15.20. For the smokers example discussed in the text (see Section 15.10) download the 
data from the textbook website in Table 15.28. See if the product of education and 
income (i.e., the interaction effect) has any effect on the probability of becoming a 
smoker. 

15.21. Download the data set Benign, which is Table 15.29, from the textbook wehsite. The 
variable cancer is a dummy variable, where 1 = had breast cancer and 0 = did not 
have breast cancer.* Using the variables age (= age of subject), HIGD (— highest 
grade completed in school), CHK (= 0 if subject did not undergo regular medical 
checkups and = 1 if subject did undergo regular checkups), AGPI (= age at first 
pregnancy), miscarriages (— number of miscarriages), and weight (= weight of 
subject), perform a logistic regression to conclude if these variables are statistically 
useful for predicting whether a woman will contract breast cancer or not. 



15A.1 Maximum Likelihood Estimation of the Logit 
and Probit Models for Individual (Ungrouped) 
Data 1 


As in the text, assume that we are interested in estimating the probability that an individual owns a 
house, given the individual’s income X. We assume that this probability can be expressed by the 
logistic function (15.5.2), which is reproduced below for convenience. 


We do not actually observe P t , but only observe the outcome 7= 1, if an individual owns a house, 
and Y = 0, if the individual does not own a house. 

Since each 7,- is a Bernoulli random variable, we can write 


Pr(7 = 1) = Pi 
Pr(I) = 0) = (1 - Pi) 


( 2 ) 

( 3 ) 


‘Data are provided on 50 women who were diagnosed as having benign breast disease and 150 age- 
matched controls, with three controls per case. Trained interviewers administered a standardized 
structured questionnaire to collect information from each subject (see Pastides, et al. [1983] and 
Pastides, etal. [1985]). 

iThe following discussion leans heavily on John Neter, Michael H. Kutner, Christopher J. Nachsteim, 
and William Wasserman, Applied Linear Statistical Models, 4th ed. ( Irwin, 1996, pp. 573-574. 
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Suppose we have a random sample of n observations. Letting J(Yi) denote the probability that Y, — 
1 or 0, the joint probability of observing the n Y values, i.e., f(Y\, V 2 ,..., Y„ ) is given as: 

f(Y u Y 2 ,...,Y n ) = f\ f{Yi) = fj Pp(l - P,) 1-7 ' (4) 

where is the product operator. Note that we can write the joint probability density function as a 
product of individual density functions because each Y t is drawn independently and each Y t has the 
same (logistic) density function. The joint probability given in Eq. (4) is known as the likelihood 
function (LF). 

Equation (4) is a little awkward to manipulate. But if we take its natural logarithm, we obtain what 
is called the log likelihood function (LLF): 

In f{Y u Y 2 .?„)*=£ [Yi In P, + (1 - fj) ln(l - «}J 

= [ Y i ln p i ~ Y In(1 - Pi) + ln(l - />)] (5) 

From Eq. (1) it is easy to verify that 

< 6 > 

as well as 

^(' 1 ( 7 ) 

Using Eqs. (6) and (7), we can write the LLF (5) as: 

ln f(Y u Y 2 ,..., ffl= Yi(fii + (kXi) -X>[l+ e ( P' + ^] (8) 

As you can see from Eq. (8), the log likelihood function is a function of the parameters f)\ and f} 2 , 
since the X t are known. 

In ML our objective is to maximize the LF (or LLF), that is, to obtain the values of the unknown 
parameters in such a manner that the probability of observing the given T’s is as high (maximum) as 
possible. For this purpose, we differentiate Eq. (8) partially with respect to each unknown, set the re¬ 
sulting expressions to zero, and solve the resulting expressions. One can then apply the second-order 
condition of maximization to verify that the values of the parameters we have obtained do in fact 
maximize the LF. 

So, you have to differentiate Eq. (8) with respect to fh and fl 2 and proceed as indicated. As you will 
quickly realize, the resulting expressions become highly nonlinear in the parameters and no explicit 
solutions can be obtained. That is why we will have to use one of the methods of nonlinear estimation 
discussed in the previous chapter to obtain numerical solutions. Once the numerical values of fl\ and 
f J >2 are obtained, we can easily estimate Eq. (1). 

The ML procedure for the probit model is similar to that for the logit model, except that in Eq. (1) 
we use the normal CDF rather than the logistic CDF. The resulting expression becomes rather com¬ 
plicated, but the general idea is the same. So, we will not pursue it any further. 




Chapter 


Panel Data Regression 
Models 


In Chapter 1 we discussed briefly the types of data that are generally available for empir¬ 
ical analysis, namely, time series, cross section, and panel. In time series data we observe 
the values of one or more variables over a period of time (e.g., GDP for several quarters 
or years). In cross-section data, values of one or more variables are collected for several 
sample units, or subjects, at the same point in time (e.g., crime rates for 50 states in the 
United States for a given year). In panel data the same cross-sectional unit (say a family 
or a firm or a state) is surveyed over time. In short, panel data have space as well as time 
dimensions. 

We have already seen an example of this in Table 1.1, which gives data on eggs produced 
and their prices for 50 states in the United States for years 1990 and 1991. For any given 
year, the data on eggs and their prices represent a cross-sectional sample. For any given 
state, there are two time series observations on eggs and their prices. Thus, we have in all 
100 (pooled) observations on eggs produced and their prices. 

Another example of panel data was given in Table 1.2, which gives data on investment, 
value of the firm, and capital stock for four companies for the period 1935-1954. The data 
for each company over the period 1935-1954 constitute time series data, with 20 observa¬ 
tions; data, for all four companies for a given year is an example of cross-section data, with 
only four observations; and data for all the companies for all the years is an example of 
panel data, with a total of 80 observations. 

There are other names for panel data, such as pooled data (pooling of time series 
and cross-sectional observations), combination of time series and cross-section data, 
micropanel data, longitudinal data (a study over time of a variable or group of subjects), 
event history analysis (studying the movement over time of subjects through successive 
states or conditions), and cohort analysis (e.g., following the career path of 1965 graduates 
of a business school). Although there are subtle variations, all these names essentially con¬ 
note movement over time of cross-sectional units. We will therefore use the term panel data 
in a generic sense to include one or more of these terms. And we will call regression mod¬ 
els based on such data panel data regression models. 

Panel data are now being used increasingly in economic research. Some of the well- 
known panel data sets are: 

1. The Panel Study of Income Dynamics (PSID) conducted by the Institute of Social 
Research at the University of Michigan. Started in 1968, each year the Institute col¬ 
lects data on some 5,000 families about various socioeconomic and demographic 
variables. 


591 
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2. The Bureau of the Census of the Department of Commerce conducts a survey similar to 
PSID, called the Survey of Income and Program Participation (SIPP). Four times a 
year respondents are interviewed about their economic condition. 

3. The German Socio-Economic Panel (GESOEP) studied 1,761 individuals every year 
between 1984 and 2002. Information on year of birth, gender, life satisfaction, marital 
status, individual labor earnings, and annual hours of work was collected for each indi¬ 
vidual for the period 1984 to 2002. 

There are also many other surveys that are conducted by various governmental agencies, 
such as: 

Household, Income and Labor Dynamics in Australia Survey (HILDA) 

British Household Panel Survey (BHPS) 

Korean Labor and Income Panel Study (KLIPS) 

At the outset a warning is in order: The topic of panel data regressions is vast, and some of 
the mathematics and statistics involved are quite complicated. We only hope to touch on some 
of the essentials of the panel data regression models, leaving the details for the references. * 1 But 
be forewarned that some of these references are highly technical. Fortunately, user-friendly 
software packages such as LIMDEP, PC-GIVE, SAS, STATA, SHAZAM, and EViews, among 
others, have made the task of actually implementing panel data regressions quite easy. 

16.1 Why Panel Data? 

What are the advantages of panel data over cross-section or time series data? Baltagi lists 
the following advantages of panel data: 2 3 4 

1. Since panel data relate to individuals, firms, states, countries, etc., over time, there is 
bound to be heterogeneity in these units. The techniques of panel data estimation can 
take such heterogeneity explicitly into account by allowing for subject-specific vari¬ 
ables, as we shall show shortly. We use the term subject in a generic sense to include 
microunits such as individuals, firms, states, and countries. 

2. By combining time series of cross-section observations, panel data gives “more infor¬ 
mative data, more variability, less collinearity among variables, more degrees of free¬ 
dom and more efficiency.” 

3. By studying the repeated cross section of observations, panel data are better suited to 
study the dynamics of change. Spells of unemployment, job turnover, and labor mobility 
are better studied with panel data. 

4. Panel data can better detect and measure effects that simply cannot be observed in pure 
cross-section or pure time series data. For example, the effects of minimum wage laws 

'Some of the references are C. Chamberlain, "Panel Data," in Handbook of Econometrics, vol. II; 

Z. Criliches and M. D. Intriligator, eds., North-Holland Publishers, 1984, Chapter 22; C. Hsiao, 

Analysis of Panel Data, Cambridge University Press, 1986; C. C. Judge, R. C. Hill, W. E. Griffiths, 

H. Lutkepohl, and T. C. Lee, Introduction to the Theory and Practice of Econometrics, 2d ed., John Wiley 
Sc Sons, New York, 1985, Chapter 11; W. H. Greene, Econometric Analysis, 6th ed., Prentice-Hall, 
Englewood Cliffs, NJ, 2008, Chapter 9; Badi H. Baltagi, Econometric Analysis of Panel Data, John Wiley 
and Sons, New York, 1995; and J. M. Wooldridge, Econometric Analysis of Cross Section and Panel 
Data, MIT Press, Cambridge, Mass., 1999. For a detailed treatment of the subject with empirical 
applications, see Edward W. Frees, Longitudinal and Panel Data: Analysis and Applications in the Social 
Sciences, Cambridge University Press, New York, 2004. 

2 Baltagi, op. cit., pp. 3-6. 
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on employment and earnings can be better studied if we include successive waves of 
minimum wage increases in the federal and/or state minimum wages. 

5. Panel data enables us to study more complicated behavioral models. For example, 
phenomena such as economies of scale and technological change can be better handled 
by panel data than by pure cross-section or pure time series data. 

6. By making data available for several thousand units, panel data can minimize the bias 
that might result if we aggregate individuals or firms into broad aggregates. 

In short, panel data can enrich empirical analysis in ways that may not be possible if we use only 
cross-section or time series data. This is not to suggest that there are no problems with panel 
data modeling. We will discuss them after we cover some theory and discuss some examples. 

16.2 Panel Data: An Illustrative Example 

To set the stage, let us consider a concrete example. Consider the data given as Table 16.1 
on the textbook website, which were originally collected by Professor Moshe Kim and are 
reproduced from William Greene. 3 The data analyzes the costs of six airline firms for the 
period 1970-1984, for a total of 90 panel data observations. 

The variables are defined as: / = airline id; T — year id; Q = output, in revenue passen¬ 
ger miles, an index number; C = total cost, in $1,000; PF — fuel price; and LF — load fac¬ 
tor, the average capacity utilization of the fleet. 

Suppose we are interested in finding out how total cost (Q behaves in relation to output (Q), 
fuel price (PF), and load factor (LF). In short, we wish to estimate an airline cost function. 

How do we go about estimating this function? Of course, we can estimate the cost func¬ 
tion for each airline using the data for 1970-1984 (i.e., a time series regression). This can 
be accomplished with the usual ordinary least squares (OLS) procedure. We will have in all 
six cost functions, one for each airline. But then we neglect the information about the other 
airlines which operate in the same (regulatory) environment. 

We can also estimate a cross-section cost function (i.e., a cross-section regression). 
We will have in all 15 cross-section regressions, one for each year. But this would not make 
much sense in the present context, for we have only six observations per year and there are 
three explanatory variables (plus the intercept term); we will have very few degrees of free¬ 
dom to do a meaningful analysis. Also, we will not “exploit” the panel nature of our data. 

Incidentally, the panel data in our example is called a balanced panel; a panel is said to 
be balanced if each subject (firm, individuals, etc.) has the same number of observations. If 
each entity has a different number of observations, then we have an unbalanced panel. For 
most of this chapter, we will deal with balanced panels. In the panel data literature you will 
also come across the terms short panel and long panel. In a short panel the number of 
cross-sectional subjects, N, is greater than the number of time periods, 71 In a long panel, it 
is T that is greater than N. As we discuss later, the estimating techniques can depend on 
whether we have a short panel or a long one. 

What, then, are the options? There are four possibilities: 

1. Pooled OLS model. We simply pool all 90 observations and estimate a “grand” 
regression, neglecting the cross-section and time series nature of our data. 

2. The fixed effects least squares dummy variable (LSDV) model. Here we pool all 90 
observations, but allow each cross-section emit (i.e., airline in our example) to have its 
own (intercept) dummy variable. 

3 William H. Greene, Econometric Analysis, 6th ed., 2008. Data are located at http://pages.stern.nyu.edu/ 
~wgreen/Text/econometricanalysis.htm. 
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3. The fixed effects within-group model. Here also we pool all 90 observations, but for 
each airline we express each variable as a deviation from its mean value and then esti¬ 
mate an OLS regression on such mean-corrected or “de-meaned” values. 

4. The random effects model (REM). Unlike the LSDV model, in which we allow each 
airline to have its own (fixed) intercept value, we assume that the intercept values are a 
random drawing from a much bigger population of airlines. 

We now discuss each of these methods using the data given in Table 16.1. (See textbook 
website.) 

16.3 Pooled OLS Regression or Constant Coefficients Model 

m hb§ 

i = l , 2 ,. ..,6 

ag HMMi 

where' / is i^fe u bfeiiBiM t isiM rim iP^ ri otilgBBiMgaH 'ab l^EMBlBI^MBIB rouslv^^ 
have chosen the linear cost function for illustrative purposes, but in Exercise 16.10 you are 
asked to estimate a log-linear, or double-log function, in which case the slope coefficients 
will give the elasticity estimates. 

Notice that we have pooled to Mmmlll^B 'Ob ^t ati 
the regression coefficients are the same for all the airlines. That is, there is no distinction 
between the airlines — one airline is as good as the other, an assumption that may be diffi¬ 
cult to maintain. 

It is assumed that the explanatory variables are nonstochastic. If they are stochastic, they 

are uncorrelated with the error term. Sometimes it is assumed that the explanatory variables 
are strictly exogenous. A variable is said to be strictly exogenous if it does not depend on 
current, past, and future values of the error term m„. 

It is also assumed that the error term is Ua ~ iid{ 0, erf), that is, it is independently and 
identically distributed with zero mean and constant variance. For the purpose of hypothe¬ 
sis testing, it may be assumed that the error term is also normally distributed. Notice the 
double-subscripted notation in Eq. (16.3.1), which should be self-explanatory. 

Let us first present the results of the estimated equation (16.3.1) and then discuss some 
of the problems with this model. The regression results based on EViews, Version 6 are pre¬ 
sented in Table 16.2. 

If you examine the results of the pooled regression and apply the conventional criteria, 
you will see that all the regression coefficients are not only highly statistically significant 
but are also in accord with prior expectations and that the R 2 value is very high. The only 
“fly in the ointment” is that the estimated Durbin-Watson statistic is quite low, suggesting 
that perhaps there is autocorrelation and/or spatial correlation in the data. Of course, as we 
know, a low Durbin-Watson could also be due to specification errors. 

fid major problem wilt ids model is that it does not distinguish between the various 
airlines nor does it tell us whether the response of total cost to the explanatory variables 

over time is the same for all the airlines. In other words, by lumping together different air¬ 
lines at different times we camouflage the heterogeneity (individuality or uniqueness) that 
may exist among the airlines. Another way of stating this is that the individuality of each 
subject is subsumed in the disturbance term u lt . As a consequence, it is quite possible that 
the error term may be correlated with some of the regressors included in the model. If that 
is the case, the estimated coefficients in Eq. (16.3.1) may be biased as well as inconsistent. 
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Dependent 7ari ab.l e : C 
Method: Least Squares 
,t#fcluded observations: 90 
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;@td. Error 

t Statistic 

Prob. 

c (intercept) 

1158559.. 

360592:,? 

3.212930 

0.0018 

Q 

2026114. 

61806.95 

32.78134 

P. 00 SI 

PF 

1.225348 

0.103722 

ij| ii. 81380 

0.0000 

LF 

3065753. 

696327.3 

-4.402747 

i.ooiff 


B-squared 

.6,946093 

Mean dependent var. 

1122524. 

Adjusted B-squared 

0.944213 

S.D. dependent var. 

1192075. 

S.E. of regression-: 

281559.5 

F-statisf|$i6v 

503.1176 

Sum squared resid. 

6.82E+13 

Prob. (F-statistic) 
Durfein-Wat s on 

o.ooooofH 

0.434162 


Recall that one of the important assumptions of the classical linear regression model is that 
there is no correlation between the regressors and the disturbance or error term. 

To see how the error term may be correlated with the regressors, let us consider the 
following revision of model (16.3.1): 

Cu = Pi + p 2 PF it + foLFu + PaMu + u it ( 16 . 3 . 2 ) 
where the additional variable M — management philosophy or management quality. Of the 
variables included in Eq. (16.3.2), only the variable Mis time-invariant (or time-constant) 
because it varies among subjects but is constant over time for a given subject (airline). 

Although it is time-invariant, the variable M is not directly observable and therefore we 
cannot measure its contribution to the cost function. We can, however, do this indirectly if 
we write Eq. (16.3.2) as 

C it = Pi + P 2 P F it + PiLF it + a t + u it ( 16 . 3 . 3 ) 

where a,-, called the unobserved, or heterogeneity, effect, reflects the impact of M on 
cost. Note that for simplicity we have shown only the unobserved effect of M on cost, but 
in reality there may be more such unobserved effects, for example, the nature of ownership 
(privately owned or publicly owned), whether it is a minority-owned company, whether the 
CEO is a man or a woman, etc. Although such variables may differ among the subjects (air¬ 
lines), they will probably remain the same for any given subject over the sample period. 

Since a, is not directly observable, why not consider it random and include it in the error 
term u,,, and thereby consider the composite error term % = a, + w I( ? We now write 
Eq. (16.3.3) as: 

C it = pi + p 2 P Fu + p 3 LF it + v it ( 16 . 3 . 4 ) 

But if the a,- term included in the error term v lt is correlated with any of the regressors 
in Eq. (16.3.4), we have a violation of one of the key assumptions of the classical linear re¬ 
gression model—namely, that the error term is not correlated with the regressors. As we 
know in this situation, the OLS estimates are not only biased but they are also inconsistent. 

There is a real possibility that the unobservable a, is correlated with one or more of the 
regressors. For example, the management of one airline may be astute enough to buy future 
contracts of the fuel price to avoid severe price fluctuations. This will have the effect of 
lowering the cost of airline services. As a result of this correlation, it can be shown that 
cov (v it , v is ) = rr„ 2 ; t ± s, which is non-zero, and therefore, the (unobserved) heterogene¬ 
ity induces autocorrelation and we will have to pay attention to it. We will show later how 
this problem can be handled. 
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The question, therefore, is how we account for the unobservable, or heterogeneity, effect(s) 
so that we can obtain consistent and/or efficient estimates of the parameters of the variables 
of prime interest, which are output, fuel price, and load factor in our case. Our prime interest 
may not be in obtaining the impact of the unobservable variables because they remain the 
same for a given subject. That is why such unobservable, or heterogeneity, effects are called 
nuisance parameters. How then do we proceed? It is to this question we now turn. 

16.4 The Fixed Effect Least-Squares Dummy 
Variable (LSDV) Model 

The least-squares dummy variable (LSDV) model allows for heterogeneity among subjects 
by allowing each entity to have its own intercept value, as shown in model (16.4.1). Again, 
we continue with our airlines example. 

C it = Pa + piQtt + PsPFt, + p A LF it + u it (16.4.1) 
i = 1,2..., 6 
tm 4,2,..., 15 

Notice that we have put the subscript i on the intercept term to suggest that the intercepts of the 
six airlines may be different. The difference may be due to special features of each airline, such 
as managerial style, managerial philosophy, or the type of market each airline is serving. 

In the literature, model (16.4.1) is known as the fixed effects (regression) model 
(FEM). The term “fixed effects” is due to the fact that, although the intercept may differ 
across subjects (here the six airlines), each entity’s intercept does not vary over time, that 
is, it is time-invariant. Notice that if we were to write the intercept as Pm, it would sug¬ 
gest that the intercept of each entity or individual is time-variant. It may be noted that the 
FEM given in Eq. (16.4.1) assumes that the (slope) coefficients of the regressors do not 
vary across individuals or over time. 

Before proceeding further, it may be useful to visualize the difference between the 
pooled regression model and the LSDV model. For simplicity assume that we want to 
regress total cost on output only. In Figure 16.1 we show this cost function estimated for 
two airline companies separately, as well as the cost function if we pool the data for the two 


FIGURE 16.1 

Bias from ignoring 
fixed effects. 



Output 
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TABLE 16.3 


companies; this is equivalent to neglecting the fixed effects. 4 You can see from Figure 16.1 
how the pooled regression can bias the slope estimate. 

How do we actually allow for the (fixed effect) intercept to vary among the airlines? We can 
easily do this by using the dummy variable technique, particularly the differential intercept 
dummy technique, which we learned in Chapter 9. Now we write Eq. (16.4.1) as: 

Cu = ai + a 2 D 2l + a 3 D 3l + a 4 D 4l + a 5 D 5l + a 6 D (u 

+ thQu + foPF it + p 4 LF it + u it (16.4.2) 

where D 2l — 1 for airline 2, 0 otherwise; D 2i — 1 for airline 3, 0 otherwise; and so on. 
Notice that since we have six airlines, we have introduced only five dummy variables to 
avoid falling into the dummy-variable trap (i.e., the situation of perfect collinearity). Here 
we are treating airline 1 as the base, or reference, category. Of course, you can choose any 
airline as the reference point. As a result, the intercept ct\ is the intercept value of airline 1 
and the other a coefficients represent by how much the intercept values of the other airlines 
differ from the intercept value of the first airline. Thus, a 2 tells by how much the intercept 
value of the second airline differs from a\ . The sum (a\ + a 2 ) gives the actual value of the 
intercept for airline 2. The intercept values of the other airlines can be computed similarly. 
Keep in mind that if you want to introduce a dummy for each airline, you will have to drop 
the (common) intercept; otherwise, you will fall into the dummy-variable trap. 

The results of the model (16.4.2) for our data are presented in Table 16.3. 

The first thing to notice about these results is that all the differential intercept coeffi¬ 
cients are individually highly statistically significant, suggesting that perhaps the six air¬ 
lines are heterogeneous and, therefore, the pooled regression results given in Table 16.2 
may be suspect. The values of the slope coefficients given in Tables 16.2 and 16.3 are also 
different, again casting some doubt on the results given in Table 16.2. ft seems model 
(16.4.1) is better than model (16.3.1). In passing, note that OLS applied to a fixed effect 
model produces estimators that are called fixed effect estimators. 
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Variable: TC 




Method: Least Squares 
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R-squared 0. 
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o.oooool 

Log likel 
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.n-Watson stat. 
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4 Adapted from the unpublished notes of Alan Duncan. 
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We can provide a formal test of the two models. In relation to model (16.4.1), model 

(16.3.1) is a restricted model in that it imposes a common intercept for all the airlines. 
Therefore, we can use the restricted F test discussed in Chapter 8. Using formula (8.6.10), 
the reader can check that in the present case the F value is: 

p (0.971642-0.946093)/5 ~ gg 
(1 - 0.971642)/81 

Note: The restricted and unrestricted R 2 values are obtained from Tables 16.1 and 16.2. 
Also note that the number of restrictions is 5 (why?). 

The null hypothesis here is that all the differential intercepts are equal to zero. The com¬ 
puted F value for 5 numerator and 81 denominator df is highly statistically significant. 
Therefore, we reject the null hypothesis that all the (differential) intercepts are zero. If the 
F value were not statistically significant, we would have concluded that there is no differ¬ 
ence in the intercepts of the six airlines. In this case, we would have pooled all 90 of the 
observations, as we did in the pooled regression given in Table 16.2. 

Model (16.4.1) is known as a one-way fixed effects model because we have allowed the 
intercepts to differ between airlines. But we can also allow for time effect if we believe that 
the cost function changes over time because of factors such as technological changes, changes 
in government regulation and/or tax policies, and other such effects. Such a time effect can be 
easily accounted for if we introduce time dummies, one for each year from 1970 to 1984. 
Since we have data for 15 years, we can introduce 14 time dummies (why?) and extend model 

(16.4.1) by adding these variables. If we do that, the model that emerges is called a two-way 
fixed effects model because we have allowed for both individual and time effects. 

In the present example, if we add the time dummies, we will have in all 23 coefficients to 
estimate—the common intercept, five airlines dummies, 14 time dummies, and three slope 
coefficients. As you can see, we will consume several degrees of freedom. Furthermore, if 
we decide to allow the slope coefficients to differ among the companies, we can interact the 
five firm (airline) dummies with each of the three explanatory variables and introduce 
differential slope dummy coefficients. Then we will have to estimate 15 additional coeffi¬ 
cients (five dummies interacted with three explanatory variables). As if this is not enough, if we 
interact the 14 time dummies with the three explanatory variables, we will have in all 42 addi¬ 
tional coefficients to estimate. As you can see, we will not have any degrees of freedom left. 

A Caution in the Use of the Fixed Effect LSDV Model 

As the preceding discussion suggests, the LSDV model has several problems that need to 
be borne in mind: 

First, if you introduce too many dummy variables, you will run up against the degrees 
of freedom problem. That is, you will lack enough observations to do a meaningful statis¬ 
tical analysis. Second, with many dummy variables in the model, both individual and inter¬ 
active or multiplicative, there is always the possibility of multicollinearity, which might 
make precise estimation of one or more parameters difficult. 

Third, in some situations the LSDV may not be able to identify the impact of time- 
invariant variables. Suppose we want to estimate a wage function for a group of workers 
using panel data. Besides wage, a wage function may include age, experience, and educa¬ 
tion as explanatory variables. Suppose we also decide to add sex, color, and ethnicity as 
additional variables in the model. Since these variables will not change over time for an 
individual subject, the LSDV approach may not be able to identify the impact of such time- 
invariant variables on wages. To put it differently, the subject-specific intercepts absorb all 
heterogeneity that may exist in the dependent and explanatory variables. Incidentally, the 
time-invariant variables are sometimes called nuisance variables or lurking variables. 
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Fourth, we have to think carefully about the error term u lt . The results we have pre¬ 
sented in Eqs. (16.3.1) and (16.4.1) are based on the assumption that the error term follows 
the classical assumptions, namely, u lt ~ N{ 0, a 2 ). Since the index i refers to cross-section 
observations and t to time series observations, the classical assumption for u lt may have to 
be modified. There are several possibilities, including: 

1. We can assume that the error variance is the same for all cross-section units or we can 
assume that the error variance is heteroscedastic. 5 

2. For each entity, we can assume that there is no autocorrelation over time. Thus, in our 
illustrative example, we can assume that the error term of the cost function for airline #1 is 
non-autocorrelated, or we can assume that it is autocorrelated, say, of the AR(1) type. 

3. For a given time, it is possible that the error term for airline #1 is correlated with the 
error term for, say, airline #2. 6 Or we can assume that there is no such correlation. 

There are also other combinations and permutations of the error term. As you will quickly 
realize, allowing one or more of these possibilities will make the analysis that much more com¬ 
plicated. (Space and mathematical demands preclude us from considering all the possibilities. 
The references in footnote 1 discuss some of these topics.) Some of these problems may be 
alleviated, however, if we consider the alternatives discussed in the next two sections. 

16.5 The Fixed-Effect Within-Group (VG) Estimator 

One way to estimate a pooled regression is to eliminate the fixed effect, fo t , by expressing 
the values of the dependent and explanatory variables for each airline as deviations from 
their respective mean values. Thus, for airline #1 we will obtain the sample mean values of 
TC, Q, PF, and LF, ( TC, Q, PF, and LF, respectively) and subtract them from the indi¬ 
vidual values of these variables. The resulting values are called “de-meaned” or mean- 
corrected values. We do this for each airline and then pool all the (90) mean-corrected 
values and run an OFS regression. 

Letting tc lt , qu, pft, and lf t represent the mean-corrected values, we now run the 
regression: 

tc it = foq it + fop ft + folft + Ui, (16.5.1) 

where i= 1,2,.. ., 6, and t — 1,2,..., 15. Note that Eq. (16.5.1) does not have an inter¬ 
cept term (why?). 

Returning to our example, we obtain the results in Table 16.4. Note: The prefix DM 
means that the values are mean-corrected or expressed as deviations from their sample 
means. 

Note the difference between the pooled regression given in Table 16.2 and the pooled 
regression in Table 16.4. The former simply ignores the heterogeneity among the six air¬ 
lines, whereas the latter takes it into account, not by the dummy variable method, but by 
eliminating it by differencing sample observations around their sample means. The differ¬ 
ence between the two is obvious, as shown in Figure 16.2. 

It can be shown that the WG estimator produces consistent estimates of the slope 
coefficients, whereas the ordinary pooled regression may not. It should be added, however, 

s STATA provides heteroscedasticity-corrected standard errors in the panel data regression models. 

6 This leads to the so-called seemingly unrelated regression (SURE) model, originally proposed 
by Arnold Zellner. See A. Zellner, "An Efficient Method of Estimating Seemingly Unrelated Regressions 
and Tests for Aggregation Bias," journal of the American Statistical Association, vol. 57, 1962, 
pp. 348-368. 
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TABLE 16.4 


FIGURE 16.2 

The within-groups 
estimator. 


Source: Alan Duncan, “Cross- 
Section and Panel Data 
Econometrics,” unpublished 
lecture notes (adapted). 


Dependent Variable: DMTC 
Method: Least Squares 
Sample: 1-90 

TCacluded observations: 90 


Coefficient Std. Error t Statistic Prob. 


DMQ 3319023. 165339.8 20.07396 0.0000 
DMPF 0.773071 0.093903 8.232630 0.0000 
DMLF -3797368. 592230.5 -6.411976 0.0000 
R-squared 0.929366 Mean dependent var. 2.59E-11 
Adjusted R-squared 0.927743 S.D. dependent var. 755325.8 
S.E. of regression 203037.2 Durb.in-Wat.son stat. 0.693287 



that WG estimators, although consistent, are inefficient (i.e., have larger variances) 
compared to the ordinary pooled regression results. 7 Observe that the slope coefficients of 
the Q, PF, and if 7 are identical in Tables 16.3 and 16.4. This is because mathematically the 
two models are identical. Incidentally, the regression coefficients estimated by the WG 
method are called WG estimators. 

One disadvantage of the WG estimator can be explained with the following wage 
regression model: 

Wit = Pu + /^Experience,, + ft Age,, + ftGender,, + ft Education,, + ftRace,, 

(16.5.2) 

In this wage function, variables such as gender, education, and race are time-invariant. If 
we use the WG estimators, these time-invariant variables will be wiped out (because of 

7 The reason for this is that when we express variables as deviations from their mean values, the varia¬ 
tion in these mean-corrected values will be much smaller than the variation in the original values of 
the variables. In that case, the variation in the disturbance term u, t may be relatively large, thus 
leading to higher standard errors of the estimated coefficients. 
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differencing). As a result, we will not know how wage reacts to these time-invariant vari¬ 
ables. 8 But this is the price we have to pay to avoid the correlation between the error term 
(a,- included in v it ) and the explanatory variables. 

Another disadvantage of the WG estimator is that, “... it may distort the parameter val¬ 
ues and can certainly remove any long run effects.” 9 In general, when we difference a vari¬ 
able, we remove the long-run component from that variable. What is left is the short-run 
value of that variable. We will discuss this further when we discuss time series economet¬ 
rics later in the book. 

In using LSDV we obtained direct estimates of the intercepts for each airline. How can 
we obtain the estimates of the intercepts using the WG method? For the airlines example, 
they are obtained as follows: 

a t =Q- p 2 Q~i ~ foPFi ~ faLF (16.5.3) 

where bars over the variables denote the sample mean values of the variables for the z'th 
airline. 

That is, we obtain the intercept value of the zth airline by subtracting from the mean 
value of the dependent variable the mean values of the explanatory variables for that airline 
times the estimated slope coefficients from the WG estimators. Note that the estimated 
slope coefficients remain the same for all of the airlines, as shown in Table 16.4. It may be 
noted that the intercept estimated in Eq. (16.5.3) is similar to the intercept we estimate in 
the standard linear regression model, which can be see from Eq. (7.4.21). We leave it for 
the reader to find the intercepts of the six airlines in the manner shown and verify that they 
are the same as the intercept values derived in Table 16.3, save for the rounding errors. 

It may be noted that the estimated intercept of each airline represents the subject-specific 
characteristics of each airline, but we will not be able to identify these characteristics indi¬ 
vidually. Thus, the a\ intercept for airline #1 represents the management philosophy of that 
airline, the composition of its board of directors, the personality of the CEO, the gender of 
the CEO, etc. All these heterogeneity characteristics are subsumed in the intercept value. 
As we will see later, such characteristics can be included in the random effects model. 

In passing, we note that an alternative to the WG estimator is the first-difference 
method. In the WG method, we express each variable as a deviation from that variable’s 
mean value. In the first-difference method, for each subject we take successive differences 
of the variables. Thus, for airline #1 we subtract the first observation of TC from the second 
observation of TC, the second observation of TC from the third observation of TC, and so 
on. We do this for each of the remaining variables and repeat this process for the remaining 
five airlines. After this process we have only 14 observations for each airline, since the first 
observation has no previous value. As a result, we now have 84 observations instead of the 
original 90 observations. We then regress the first-differenced values of the TC variable on 
the first-differenced values of the explanatory variables as follows: 

A TC it = PiAQi, + p 3 APF it + P^ALF it + (u it - u U -i) 

i = 1,2,... ,6 (16.5.4) 

t = 1,2,..., 84 

where A = ( TCj t — T C, *_i). As noted in Chapter 11, A is called the first difference 
operator. 10 

8 This is also true of the LSDV model. 

9 Dimitrios Asteriou and Stephen C. Hall, Applied Econometrics: A Modern Approach, Palgrave 
Macmillan, New York, 2007, p. 347. 

10 Notice that Eq. (16.5.3) has no intercept term (why?), but we can include it if there is a trend 
variable in the original model. 


602 Part Three Topics in Econometrics 


In passing, note that the original disturbance term is now replaced by the difference 
between the current and previous values of the disturbance term. If the original disturbance 
term is not autocorrelated, the transformed disturbance is, and therefore it poses the kinds 
of estimation problems that we discussed in Chapter 11. However, if the explanatory vari¬ 
ables are strictly exogenous, the first difference estimator is unbiased, given the values of 
the explanatory variables. Also note that the first-difference method has the same disad¬ 
vantages as the WG method in that the explanatory variables that remain fixed over time for 
an individual are wiped out in the first-difference transformation. 

It may be pointed out that the first difference and fixed effects estimators are the same 
when we have only two time periods, but if there are more than two periods, these estima¬ 
tors differ. The reasons for this are rather involved and the interested reader may consult the 
references. 11 It is left as an exercise for the reader to apply the first difference method to our 
airlines example and compare the results with the other fixed effects estimators. 

16.6 The Random Effects Model (REM) 

Commenting on fixed effect, or LSDY modeling, Kmenta writes: 12 

An obvious question in connection with the covariance [i.e., LSDV] model is whether the inclu¬ 
sion of the dummy variables—and the consequent loss of the number of degrees of freedom—is 
really necessary. The reasoning underlying the covariance model is that in specifying the regres¬ 
sion model we have failed to include relevant explanatory variables that do not change over time 
(and possibly others that do change over time but have the same value for all cross-sectional 
units), and that the inclusion of dummy variables is a coverup of our ignorance. 

If the dummy variables do in fact represent a lack of knowledge about the (true) model, 
why not express this ignorance through the disturbance term? This is precisely the approach 
suggested by the proponents of the so-called error components model (ECM) or random 
effects model (REM), which we will now illustrate with our airline cost function. 

The basic idea is to start with Eq. (16.4.1): 

TC U = pi t + frQit + PiPFi, + PaLF u + u it (16.6.1) 

Instead of treating Pu as fixed, we assume that it is a random variable with a mean value 
of fi\ (no subscript i here). The intercept value for an individual company can be expressed as 

Pu=Pi+*t (16.6.2) 

where e, is a random error term with a mean value of zero and a variance of cr 2 . 

What we are essentially saying is that the six firms included in our sample are a drawing 
from a much larger universe of such companies and that they have a common mean value 
for the intercept (= P \). The individual differences in the intercept values of each company 
are reflected in the error term e,. 

Substituting Eq. (16.6.2) into Eq. (16.6.1), we obtain: 

TC it = + faQit + foPFu + PaLFh + Si + u it g 

= Pi + PiQit + foPF it + p 4 LF it + w it ^ ; 

where 

w it = Si + u it (16.6.4) 

11 See in particular Jeffrey M. Wooldridge, Econometric Analysis of Cross Section and Panel Data, MIT 
Press, Cambridge, Mass., 2002, pp. 279-283. 

12 Jan Kmenta, Elements of Econometrics, 2d ed., Macmillan, New York, 1986, p. 633. 
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The composite error term w lt consists of two components: s, , which is the cross-section, 
or individual-specific, error component, and u lt , which is the combined time series and 
cross-section error component and is sometimes called the idiosyncratic term because it 
varies over cross-section (i.e., subject) as well as time. The error components model (ECM) 
is so named because the composite error term consists of two (or more) error components. 

The usual assumptions made by the ECM are that 


si ~ N{ 0, of) 
u it ~ N(0, cr 2 ) 

E(s iUit ) = 0; E(e i£j ) = 0 (i / j) 


(16.6.5) 


E(u it u is ) = EfuijUij) = E(u it u js ) = 0 (i ^ j ; t s) 


that is, the individual error components are not correlated with each other and are not autocor- 
related across both cross-section and time series units. It is also very important to note that wu 
is not correlated with any of the explanatory variables included in the model. Since e,- is a com¬ 
ponent of wu, it is possible that the latter is correlated with the explantory variables. If that is 
indeed the case, the ECM will result in inconsistent estimation of the regression coefficients. 
Shortly, we will discuss the Hausman test, which will tell us in a given application if wy is cor¬ 
related with the explanatory variables, that is, whether ECM is the appropriate model. 

Notice carefully the difference between FEM and ECM. In FEM each cross-sectional 
unit has its own (fixed) intercept value, in all N such values for N cross-sectional units. In 
ECM, on the other hand, the (common) intercept represents the mean value of all the 
(cross-sectional) intercepts and the error component e, represents the (random) deviation 
of individual intercept from this mean value. Keep in mind, however, that e, is not directly 
observable; it is what is known as an unobservable, or latent, variable. 

As a result of the assumptions stated in Eq. (16.6.5), it follows that 


E(w it ) = 0 (16.6.6) 

var(viy) = cr 2 + er 2 (16.6.7) 

Now if er 2 = 0, there is no difference between models (16.3.1) and (16.6.3) and we can 
simply pool all the (cross-sectional and time series) observations and run the pooled regres¬ 
sion, as we did in Eq. (16.3.1). This is true because in this situation there are either no 
subject-specific effects or they have all been accounted for in the explanatory variables. 

As Eq. (16.6.7) shows, the error term is homoscedastic. However, it can he shown that wy 
and Wi S (t f .s’) are correlated; that is, the error terms of a given cross-sectional unit at two dif¬ 
ferent points in time are correlated. The correlation coefficient, corr (wy, ny), is as follows: 

p = corr(wy, w is ) = f ¥= s (16.6.8) 


Notice two special features of the preceding correlation coefficient. First, for any given 
cross-sectional unit, the value of the correlation between error terms at two different times 
remains the same no matter how far apart the two time periods are, as is clear from 
Eq. (16.6.8). This is in strong contrast to the first-order [AR(l)j scheme that we discussed 
in Chapter 12, where we found that the correlation between periods declines over time. 
Second, the correlation structure given in Eq. (16.6.8) remains the same for all cross- 
sectional units; that is, it is identical for all subjects. 

If we do not take this correlation structure into account, and estimate Eq. (16.6.3) by 
OLS, the resulting estimators will be inefficient. The most appropriate method here is the 
method of generalized least squares (GLS). 
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TABLE 16.5 


Dependent Variable: TC 

Method: Panel EGLS (Cross-section random effects) 


Sample: 1-15 

Periods included: 15 

Cross-sections included: 6 

Total panel (balanced) observations: 90 

Swamy and Arora estimator of component variances 


Coefjpjgient ifeiS, Error 

| Statistic 1 

Prob. 

C 107429.3 3fiM6.2 

Q 2288588. 88172.77 
PF 1.123591 0.083298 
LF -3084994. 584373.2 

3.534251 

2 5. f15:72 
13.488ft' 
-5.279151 

fvOGOf 

0.0000 

Q.OOOf- 

0.0000 

Effect® Spec: 

ification 

s» a* 

Rho 

Cross-section random 

Idiosyncratic random 

107411.2 

210422.8 

0.2067 

0.7933 

F§§Ui Effect 



yjjjl. OOOOQQ -270615.0 

2 2.000000 -87061.32 

3 3.000000 -21338.40 

4 4.000000 187142.9 

5 S.000000 134488.9 

6 6.000000 57383.00 




We will not discuss the mathematics of GLS in the present context because of its com¬ 
plexity. 13 Since most modern statistical software packages now have routines to estimate 
ECM (as well as FEM), we will present the results for our illustrative example only. But 
before we do that, it may be noted that we can easily extend Eq. (16.4.2) to allow for a ran¬ 
dom error component to take into account variation over time (see Exercise 16.6). 

The results of ECM estimation of the airline cost function are presented in Table 16.5. 

Notice these features of the REM. The (average) intercept value is 107429.3. The (differ¬ 
ential) intercept values of the six entities are given at the bottom of the regression results. 
Firm number 1, for example, has an intercept value which is 270615 emits lower than the 
common intercept value of 107429.3; the actual value of the intercept for this airline is 
then — 163185.7. On the other hand, the intercept value of firm number 6 is higher by 57383 
units than the common intercept value; the actual intercept value for this airline is 
(107429.3 + 57383), or 164812.3. The intercept values for the other airlines can be derived 
similarly. However, note that if you add the (differential) intercept values of all the six air¬ 
lines, the sum is 0, as it should be (why?). 

If you compare the results of the fixed-effect and random-effect regressions, you will see 
that there are substantial differences between the two. The important question now is: 
Which results are reliable? Or, to put it differently, which should he the choice between the 
two models? We can apply the Hausman test to shed light on this question. 

The null hypothesis underlying the Hausman test is that the FEM and ECM estimators 
do not differ substantially. The test statistic developed by Hausman has an asymptotic y 2 


13 See Kmenta, op. cit., pp. 625-630. 
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Correlated Rafflcta® Effects—Hausman Test 

Equafcisasa: Uwtitled 

Test cross-section, tesada® effects 


Chi-Sq. 

Test Summary Statistic Chi-Sq. d.f. Prob. 

Cross-section random 49.619687 3 0.0000 


Cross-section random effects taffe: comparisons: 

Variable Fixed Random Var(Diff.) Prob. 

Q 3319023.28 2288587.95 21587779733. 0.0000 

PF 0.7 73 07 tu- 1.123591 0.002532 0.0000' 

LF -3797367.59 -3084994.0 35225469544. 0.0001 


distribution. If the null hypothesis is rejected, the conclusion is that the ECM is not appro¬ 
priate because the random effects are probably correlated with one or more regressors. In 
this case, FEM is preferred to ECM. For our example, the results of the Hausman test are 
as shown in Table 16.6. 

The Hausman test clearly rejects the null hypothesis, for the estimated x 2 value for 3 df 
is highly significant; if the null hypothesis were true, the probability of obtaining a chi- 
square value of as much as 49.62 or greater would be practically zero. As a result, we can 
reject the ECM (REM) in favor of FEM. Incidentally, the last part of the preceding table 
compares the fixed-effect and random-effect coefficients of each variable and, as the last 
column shows, in the present example the differences are statistically significant. 

Breusch and Pagan Lagrange Multiplier Test 14 

Besides the Hausman test, we can also use the Breusch-Pagan (BP) test to test the hypoth¬ 
esis that there are no random effects, i.e., rr 2 in Eq. (16.6.7) is zero. This test is built into 
software packages such as STATA. Under the null hypothesis, BP follows a chi-square dis¬ 
tribution with 1 df; there is only 1 df because we are testing the single hypothesis that 
rr 2 = 0. We will not present the formula underlying the test, for it is rather complicated. 

Turning to our airlines example, an application of the BP test produces a chi-square value 
of0.61. With 1 df, the p value of obtaining a chi-square value of 0.61 or greater is about 43 per¬ 
cent. Therefore, we do not reject the null hypothesis. In other words, the random effects model 
is not appropriate in the present example. The BP test thus reinforces the Hausman test, which 
also found that the random effects model is not appropriate for our airlines example. 

16.7 Properties of Various Estimators 15 

We have discussed several methods of estimating (linear) panel regression models, namely, 
pooled estimators, fixed effects estimators that include least squares dummy variable (LSDV) 
estimators, fixed-effect within-group estimators, first-difference estimators, and random effects 
estimators. What are their statistical properties? Since panel data generally involve a large num¬ 
ber of observations, we will concentrate on the consistency property of these estimators. 

14 T. Breusch and A. R. Pagan, "The Lagrange Multiplier Test and Its Application to Model Specifica¬ 
tion in Econometrics," Review of Economic Studies, vol. 47, 1980, pp. 239-253. 

15 The following discussion draws on A. Colin Cameron and Pravin K. Trivedi, Microeconometrics: 
Methods and Applications, Cambridge University Press, Cambridge, New York, 2005, Chapter 21. 
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Pooled Estimators 

Assuming the slope coefficients are constant across subjects, if the error term in Eq. (16.3.1) 
is uncorrelated with the regressors, pooled estimators are consistent. However, as noted 
earlier, the error terms are likely to be correlated over time for a given subject. Therefore, 
panel-corrected standard errors must be used for hypothesis testing. Make sure the 
statistical package you use has this facility, otherwise the computed standard errors may 
be underestimated. It should be noted that if the fixed effects model is appropriate but we 
use the pooled estimator, the estimated coefficients will be inconsistent. 

Fixed Effects Estimators 

Even if it is assumed that the underlying model is pooled or random, the fixed effects 
estimators are always consistent. 

Random Effects Estimators 

The random effects model is consistent even if the true model is the pooled estimator. How¬ 
ever, if the true model is fixed effects, the random effects estimator is inconsistent. 

For proofs and further details about these properties, refer to the textbooks of Cameron 
and Trivedi, Greene, and Wooldridge cited in the footnotes. 

16.8 Fixed Effects versus Random Effects Model: Some Guidelines 

The challenge facing a researcher is: Which model is better, FEM or ECM? The answer to 
this question hinges around the assumption we make about the likely correlation between 
the individual, or cross-section specific, error component s, and the X regressors. 

If it is assumed that e, and the Xs are uncorrelated, ECM may be appropriate, whereas 
if £j and the Xs are correlated, FEM may be appropriate. 

The assumption underlying ECM is that the s, are random drawings from a much larger 
population, but sometimes this may not be so. For example, suppose we want to study the 
crime rate across the 50 states in the United States. Obviously, in this case, the assumption 
that the 50 states are a random sample is not tenable. 

Keeping this fundamental difference in the two approaches in mind, what more can we 
say about the choice between FEM and ECM? Here the observations made by Judge et al. 
may be helpful: 16 

1. If 7" (the number of time series data) is large and N (the number of cross-sectional units) 
is small, there is likely to be little difference in the values of the parameters estimated by 
FEM and ECM. Hence the choice here is based on computational convenience. On this 
score, FEM may be preferable. 

2. When N is large and T is small (i.e., a short panel), the estimates obtained by the two meth¬ 
ods can differ significantly. Recall that in ECM fi\, — f$\ + e,, where s, is the cross- 
sectional random component, whereas in FEM we treat fi\ l as fixed and not random. In the 
latter case, statistical inference is conditional on the observed cross-sectional units in 
the sample. This is appropriate if we strongly believe that the individual, or cross-sectional, 
units in our sample are not random drawings from a larger sample. In that case, FEM is 
appropriate. If the cross-sectional units in the sample are regarded as random drawings, 
however, then ECM is appropriate, for in that case statistical inference is unconditional. 

3. If the individual error component e, and one or more regressors are correlated, then the 
ECM estimators are biased, whereas those obtained from FEM are unbiased. 


6 ]udge et al., op. cit., pp. 489^t91. 
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4. If N is large and T is small, and if the assumptions underlying ECM hold, ECM estima¬ 
tors are more efficient than FEM. 

5. Unlike FEM, ECM can estimate coefficients of time-invariant variables such as gender 
and ethnicity. The FEM does control for such time-invariant variables, but it cannot 
estimate them directly, as is clear from the LSDV or within-group estimator models. On 
the other hand, FEM controls for all time-invariant variables (why?), whereas ECM can 
estimate only such time-invariant variables as are explicitly introduced in the model. 

Despite the Hausman test, it is important to keep in mind the warning sounded by 
Johnston and DiNardo. In deciding between fixed effects or random effects models, they 
argue that, “ . . . there is no simple rule to help the researcher navigate past the Scylla of 
fixed effects and the Charybdis of measurement error and dynamic selection. Although 
they are an improvement over cross-section data, panel data do not provide a cure-all for all 
of an econometrician’s problems.” 17 

16.9 Panel Data Regressions: Some Concluding Comments 

As noted at the outset, the topic of panel data modeling is vast and complex. We have barely 
scratched the surface. The following are among the many topics we have not discussed. 

1. Hypothesis testing with panel data. 

2. Heteroscedasticity and autocorrelation in ECM. 

3. Unbalanced panel data. 

4. Dynamic panel data models in which the lagged value(s) of the regressand appears as an 
explanatory variable. 

5. Simultaneous equations involving panel data. 

6. Qualitative dependent variables and panel data. 

7. Unit roots in panel data (on unit roots, see Chapter 21). 

One or more of these topics can be found in the references cited in this chapter, and the 
reader is urged to consult them to learn more about this topic. These references also cite 
several empirical studies in various areas of business and economics that have used panel 
data regression models. The beginner is well-advised to read some of these applications to 
get a feel for how researchers have actually implemented such models. 18 

16.10 Some Illustrative Examples 


EXAMPLE 16.1 To find out why productivity has declined and what the role of public investment is, Alicia 

Productivity and Munnell studied productivity data in 48 continental United States for 17 years from 1970 to 

Public 1986, for a total of 816 observations. 19 Using these data, we estimated the pooled regression 

in Table 16.7. Note that this regression does not take into account the panel nature of the data. 
Investment The dependent variable in this model is GSP (gross state product), and the explanatory 

variables are: PRIVCAP (private capital), PUBCAP (public capital), WATER (water utility 
capital), and UNEMP (unemployment rate). Note: L stands for natural log. 

( Continued ) 

17 ]ack Johnston and John DiNardo, Econometric Methods, 4th ed., McGraw-Hill, 1997, p. 403. 

18 For further details and concrete applications, see Paul D. Allison, Fixed Effects Regression Methods for 
Longitudinal Data, Using SAS, SAS Institute, Cary, North Carolina, 2005. 

19 The Munnell data can be found at www.aw-bc.com/murray. 





608 Part Three Topics in Econometrics 


EXAMPLE 16.1 

(' Continued) 


TABLE 16.8 


TABLE 16.7 


Dependent Variable: LGSP 
Method: Panel Least Squares 

Sample: 1970-1986 

Periods included: 17 

Cross-sections included: 48 

Total panel (balanced) observations: 816 


Coefficient Std. Error t Statistic Prob. 


c 

LPRIVCAP 

LPUB CAP 

LWATER 

LUNEMP 

0.907604 

0.376011 

0.351478 

0.312959 

-0.069886 

0.091328 

0.027753 

0.016162 

0.018739 
0.015092 

9.937854 

13.54847 

21.74758 

16.70062 

-4.630528 

o.oooi' 
0.000# 
IvOOol- 
o.oooi 
0.0008 

R -squared 


0.981624 

Mean 

dependent var. 

10.50885 

Adjusted R- 

squared 

0.981533 

S.D. 

dependent var. 

1.021132 

S.E. of regression 

0.138765 

E-statistic. 

10830.51 

Sum squared 

resid. 

15.61630 

Prob. 

(E-statistic)' 

0.000008 

Log likelihood 

456.2346 

Durbi 

.n-Watson stat. 

0.063016 


All the variables have the expected signs and all are individually, as well as collectively, 
statistically significant, assuming all the assumptions of the classical linear regression 
model hold true. 

To take into account the panel dimension of the data, in Table 16.8 we estimated a fixed 
effects model using 47 dummies for the 48 states to avoid falling into the dummy-variable 


Dependent Variable: LGSP 
Method: Panel Least Squares 

Sarr.pl e: 1970-1986 
Periods included: If, 

Cross-sections included: 48 

Total panel (balanced) observations: 816 


Coefficient Std. Error ft Stftfc^stic Prob. 


■g 

LPRIVCAP 
LPUBCAP 
LWATER 
LUNEMP 


•0.033235 

0.267096 

0.714094 

0.088272 

•0.138854 


0.208648 
0.037015 
0.026520 
0.021581 
0.007851 


-0.159286 

7.215864 

26.92636 

4.090291 

-17.68611 


Effects Specification 


0.8;7 IS 
0.0000 
0.008 9 
0.0000 

o. mil 


Cross-section fixed (dummy variables) 


R-squared 

0.997634 

Mean dependent var. 

10,50885 

Adjusted E- squared 

0.997476 

S.D. dependent var. 

1.021132 

S.E. of regression 

0.051303 

E-statistic 

6515.897 

Sum squared resid. 

2.010854 

Sip. (E-statistic) 

0.000008. 

Igg likelihood 

1292.535 

Du£ijgt#i-Watson stat. 

0.520682 
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EXAMPLE 16.1 

(' Continued) 


TABLE 16.10 


TABLE 16.9 


Dependent Variable: LGSP 

Method: Panel EGLS (Cross-section random effects) 

Sample*. 1970-1986 

Periods included: If 

Cross-sections included: 48 

Total panel (balanced) observations: 816 

Swamy and Arora estimator of component variances 



Coefficient 

Std. Error t 

stilistie 

Prob. 

9 

-0.046176 

0.161637 

•0.285680 

0.7752 

LPRIVCAP 

0.313980 

0.Oi§740 

10 . 55 : 940 : 

0.0000 

LPUBCAP 

0.641926 

0.023330 

27.51514 

' 4.0000 

LWATER 

0.130768 

0.020281 

6.44787 S ; ' 

0.0000 

LUNEMP 

-0.139820 

0.007442 

-18.78669 

0.0000 

Effects Specification 



w.m 


Rho 

Cross-secti' 

on random 

0.130128 


0.8655 

tdiosyncrat 

ic random 

0.051303 


0.1345 


trap. To save space, we only present the estimated regression coefficients and not the indi¬ 
vidual dummy coefficients. But it should be added that all of the 47 state dummies were 
individually highly statistically significant. 

You can see that there are substantial differences between the pooled regression and 
the fixed-effects regression, casting doubt on the results of the pooled regression. 

To see if the random effects model is more appropriate in this case, we present the 
results of the random effects regression model in Table 16.9. 

To choose between the two models, we use the Hausman test, which gives the results 
shown in Table 16.10. 

Since the estimated chi-square value is highly statistically significant, we reject the 
hypothesis that there is no significant difference in the estimated coefficients of the two 
models. It seems there is correlation between the error term and one or more regressors. 
Hence, we can reject the random effects model in favor of the fixed effects model. Note, 
however, as the last part of Table 16.10 shows, not all coefficients differ in the two mod¬ 
els. For example, there is not a statistically significant difference in the values of the 
LUNEMP coefficient in the two models. 


Chi-Sq. 

Test Summary Statistic Chi-Sq. d.f. 

Cross-section random 42.4583S3 4 

Cross-section random effects taf% comparisons: 
Variable Fixed Random Var (DiljjljO'.) 

LPRIVCAP 0.267096 0.313980 0.000486 

LPUBCAP 0.714094 0.641926 0.000159 

LWATER 0.088272 0.130768 0.000054 

LUNEMP -0.138854 -0.139820 0.0C0C06 


Prob. 

0.0000 


Prob. 

0.0334 

O.OOOf. 

0.0000: 

0.6993 
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EXAMPLE 16.2 

Demand for 
Electricity 
in the USA 


TABLE 16.11 


In their article, Maddala et al. considered the demand for residential electricity and natural 
gas in 49 states in the USA for the period 1970-1990; Hawaii was not included in the 
analysis. 20 They collected data on several variables; these data can be found on the book's 
website. In this example, we will only consider the demand for residential electricity. We 
first present the results based on the fixed effects estimation (Table 16.11) and then the 
random effects estimation (Table 16.12), followed by a comparison of the two models. 


Dependent Variable: Log(ESRCBPC) 

Method: Panel Least Squares 

Sample: 1971-1990 

Periods included: 20 

Cross-sections included: 49 

Total panel (balanced) observations: 980 



Coeffllltent 

Std. Error 

r Statistic 

Prob. 

c 

-12.55764 

0.363436 

-34.55249 

0.0000 

Log(RESRCD) 

-0.628967 

0,029089 

-21.62236 

0.0000 

Log(YDPC) 

1.062439 

0.040280 

26.37663 

0.0044 

‘affects Specification 

Cross-section 

fixed (dummy variables) 




R-squared 

4.7S7600 

Mean dependent var. 

-4.53618? 

Adjusted R-squared 

1,744553 

S.D. dependent var. 

4,31620$ 

S.E. of regression 

4,159816 

Akaike info criterion 

-0.778954 

Sum squared resid. 

23.72762 

Schwarz criterion 

-0.524602 

Log likelihood 

432.6876 

Hannan-Quinn tttiter. 

-0.682188 

F-stat.istic 

Prob. (F- s t atistic) 

58.07007 

0.000000 

Burbin-Watson state. 

0.404314 


where Log (ESRCBPC) = natural log of residential electricity consumption per capita (in 
billion btu), Log(RESRCD) = natural log of real 1987 electricity price, and Log(YDPC) = 
natural log of real 1987 disposable income per capita. 

Since this is a double-log model, the estimated slope coefficients represent elasticities. 
Thus, holding other things the same, if real per capita income goes up by 1 percent, the 
mean consumption of electricity goes up by about 1 percent. Likewise, holding other 
things constant, if the real price of electricity goes up by 1 percent, the average con¬ 
sumption of electricity goes down by about 0.6 percent. All the estimated elasticities are 
statistically significant. 

The results of the random error model are as shown in Table 16.12. 

It seems that there is not much difference in the two models. But we can use the 
Hausman test to find out if this is so. The results of this test are as shown in Table 16.13. 

Although the coefficients of the two models in Tables 16.11 and 16.12 look quite sim¬ 
ilar, the Hausman test shows that this is not the case. The chi-square value is highly statis¬ 
tically significant. Therefore, we can choose the fixed effects model over the random 


20 G. S. Maddala, Robert P. Trost, Hongyi Li, and Frederick joutz, "Estimation of Short-run and Long- 
run Elasticities of Demand from Panel Data Using Shrikdage Estimators," journal of Business and 
Economic Statistics, vol. 15, no. 1, January 1997, pp. 90-100. 
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EXAMPLE 16.2 

0 Continued) 


TABLE 16.13 


TABLE 16.12 


Dependent VatfcSfele: Log(ESRCBPC) 

Method: Panel EGLS (Cross-section random effects) 

Sample: 1971-1990 

Periods included: 20 

Cross-sections included: 49 

Total panel (balanced) observations: 980 

Swamy and Arora est: imator of component variances 



Coefficient 

Std. Error 

t Statistic 

Prob. 

c 

-11.68536 

0.353285 

-33.07631 

0.0000 

Log(RESRCD) 

-0.665570 

0.028088 

-23.696U 

C.OOOC 

Log(YDPC) 

0.980877 

0.039257 

24.98617 

0.0000 


Effects Specifi'catib 





S.D. 


Rho 

Cross-section 

random 

0.123560 

0.3741 

Idiosyncratic 

random 

0.159816 

0.6359 

Weighted Statistics 


R-squared 

Adjusted R-squared 
S.E. of regression 
F-statistlb 

Prob. (F-statistic) 

0.462591 

0.461491 

0.168096 

420.4906 

o. eo'ooo'o 

Mean dependent var. 
S.D. dependent var. 
Sum squared resid. 
Durbin-Watson stat. 

-1.260296 

0.229066 

27.60641 

0.345453 


Unweighted 

Statistics 


R-squared 

0.267681 

Mean dependent var. 

-4.536187 

Sum squared resid. 

71.68384 

Durbin-Watson stat. 

0.133039 


Correlated Random Effects—Hausman Test 

Equation: Untitled 

Test cross-sedtlc5Bf-andoa effects 


Test Summary 


Chi-Sq. 

Statistic 

Chi-Sq. d.f. 

Prob. 

Cross-section 

random 

105.865216 

2 

0.0000 

Cross-section 

random effects test comparisons: 


Variable 

Fixed 

Random 

Var (Diff.) 

Prob. 

Log(RESRCD) 
Log(YDPC) 

-0.628967 

1.062439 

-0.66SI70 

0.9808’Mc 

0.00005-1 
#. 000081 

0.0DOS 
0. QB«S 


effects model. This example brings out the important point that when the sample size is large, 
in our case 980 observations, even small differences in the estimated coefficients of the two 
models can be statistically significant. Thus, the coefficients of the Log (RESRCD) variable in 
the two models look reasonably close, but statistically they are not. 
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EXAMPLE 16.3 

Beer 

Consumption, 
Income and 
Beer Tax 


To assess the impact of beer tax on beer consumption, Philip Cook investigated the rela¬ 
tionship between the two, after allowing for the effect of income. 21 His data pertain to 50 
states and Washington, D.C, for the period 1975-2000. In this example we study the 
relationship of per capita beer sales to tax rate and income, all at the state level. We pre¬ 
sent the results of pooled OLS, fixed effects, and random effects models in tabular form in 
Table 16.14. The dependent variable is per capita beer sales. 

These results are interesting. As per economic theory, we would expect a negative 
relationship between beer consumption and beer taxes, which is the case for the three 
models. The negative income effect on beer consumption would suggest that beer is an 
inferior good. An inferior good is one whose demand decreases as consumers' income 
rises. Maybe when their income rises, consumers prefer champagne! 

For our purpose, what is interesting is the difference in the estimated coefficients. 
Apparently there is not much difference in estimated coefficients between FEM and ECM. 
As a matter of fact, the Hausman test produces a chi-square value of 3.4, which is not 
significant for 2 df at the 5 percent level; the p value is 0.1783. 

The results based on OLS, however, are vastly different. The coefficient of the beer tax 
variable, in absolute value, is much smaller than that obtained from FEM or ECM. The 
income variable, although it has the negative sign, is not statistically significant, whereas 
the other two models show that it is highly significant. 

This example shows very vividly what could happen if we neglect the panel structure 
of the data and estimate a pooled regression. 






Variable 

OLS 

FEM 

REM 

Constant 

1.4192 

1.7617 

1.7542 


(24.37) 

(52.23) 

(39.22) 

Beer tax 

-0.0067 

-0.0183 

-0.0181 


(-2.13) 

(-9.67) 

(-9.69) 

Income 

— 3.54(e~ 6 ) 

-0.000020 

-0.000019 


(-1.12) 

(-9.1 7) 

(-9.10) 

R 1 2 3 

0.0062 

0.0052 

0.0052 

Notes: Figures in parenthef 

ies are the estimated t ratios 

i. —3.54(e -6 ) = -0.00000354. 



Summary and 
Conclusions 


1. Panel regression models are based on panel data. Panel data consist of observations on 
the same cross-sectional, or individual, units over several time periods. 

2. There are several advantages to using panel data. First, they increase the sample size 
considerably. Second, by studying repeated cross-section observations, panel data are 
better suited to study the dynamics of change. Third, panel data enable us to study 
more complicated behavioral models. 

3. Despite their substantial advantages, panel data pose several estimation and inference 
problems. Since such data involve both cross-section and time dimensions, problems 
that plague cross-sectional data (e.g., heteroscedasticity) and time series data (e.g., 
autocorrelation) need to be addressed. There are some additional problems as well, 
such as cross-correlation in individual units at the same point in time. 


21 The data used here are obtained from the website of Michael P. Murphy, Econometrics: A Modern In¬ 
troduction, Pearson/Addison Wesley, Boston, 2006, but the original data were collected by Philip 
Cook for his book, Paying the Tab: The Costs and Benefits of Alcohol Control, Princeton University Press, 
Princeton, New Jersey, 2007. 
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4. There are several estimation techniques to address one or more of these problems. The 
two most prominent are (1) the fixed effects model (FEM) and (2) the random effects 
model (REM), or error components model (ECM). 

5. In FEM, the intercept in the regression model is allowed to differ among individuals in 
recognition of the fact that each individual, or cross-sectional, unit may have some special 
characteristics of its own. To take into account the differing intercepts, one can use dummy 
variables. The FEM using dummy variables is known as the least-squares dummy variable 
(LSDV) model. FEM is appropriate in situations where the individual-specific intercept 
may be correlated with one or more regressors. A disadvantage of LSDV is that it consumes 
a lot of degrees of freedom when the number of cross-sectional units, N, is very large, in 
which case we have to introduce N dummies (but suppress the common intercept term). 

6. An alternative to FEM is ECM. In ECM it is assumed that the intercept of an individual 
unit is a random drawing from a much larger population with a constant mean value. The 
individual intercept is then expressed as a deviation from this constant mean value. One 
advantage of ECM over FEM is that it is economical in degrees of freedom, as we do not 
have to estimate N cross-sectional intercepts. We need only to estimate the mean value of 
the intercept and its variance. ECM is appropriate in situations where the (random) inter¬ 
cept of each cross-sectional unit is uncorrelated with the regressors. Another advantage 
of ECM is that we can introduce variables such as gender, religion, and ethnicity, which 
remain constant for a given subject. In FEM we cannot do that because all such variables 
are colinear with the subject-specific intercept. Moreover, if we use the within-group 
estimator or first-difference estimator, all such time-invariance will be swept out. 

7. The Hausman test can be used to decide between FEM and ECM. We can also use the 
Breusch-Pagan test to see if ECM is appropriate. 

8. Despite its increasing popularity in applied research, and despite the increasing avail¬ 
ability of such data, panel data regressions may not be appropriate in every situation. 
One has to use some practical judgment in each case. 

9. There are some specific problems with panel data that need to be borne in mind. The 
most serious is the problem of attrition, whereby, for one reason or another, subjects of 
the panel drop out over time so that over subsequent surveys (or cross-sections) fewer 
original subjects remain in the panel. Even if there is no attrition, over time subjects may 
refuse or be unwilling to answer some questions. 


Questions 

16.1. What are the special features of (a) cross-section data, ( b ) time series data, and 
(c) panel data? 

16.2. What is meant by a fixed effects model (FEM)? Since panel data have both time and 
space dimensions, how does FEM allow for both dimensions? 

16.3. What is meant by an error components model (ECM)? How does it differ from 
FEM? When is ECM appropriate? And when is FEM appropriate? 

16.4. Is there a difference between LSDY within-estimator, and first-difference models? 

16.5. When are panel data regression models inappropriate? Give examples. 

16.6. How would you extend model (16.4.2) to allow for a time error component? Write 
down the model explicitly. 

16.7. Refer to the data on eggs produced and their prices given in Table 1.1. Which model 
may be appropriate here, FEM or ECM? Why? 
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16.8. For the investment data given in Table 1.2, which model would you choose—FEM 
or REM? Why? 

16.9. Based on the Michigan Income Dynamics Study, Hausman attempted to estimate 
a wage, or earnings, model using a sample of 629 high school graduates, who 
were followed for a period of six years, thus giving in all 3,774 observations. The de¬ 
pendent variable in this study was logarithm of wage, and the explanatory variables 
were: age (divided into several age groups); unemployment in the previous year; 
poor health in the previous year; self-employment; region of residence (for graduate 
from the South, South = 1 and 0 otherwise) and area of residence (for a graduate 
from rural area, Rural = 1 and 0 otherwise). Hausman used both FEM and ECM. 
The results are given in Table 16.15 (standard errors in parentheses). 


TABLE 16.15 

Variable 

Fixed Effects 

Random Effects 

Wage Equations 

1. Age 1 (20-35) 

0.0557 (0.0042) 

0.0393 (0.0033) 

(Dependent Variable: 

2. Age 2 (35-45) 

0.0351 (0.0051) 

0.0092 (0.0036) 

Log Wage) 

3. Age 3 (45-55) 

0.0209 (0.0055) 

-0.0007 (0.0042) 

Source: Reproduced from 

4. Age 4 (55-65) 

0.0209 (0.0078) 

-0.0097 (0.0060) 

Cheng Hsiao, Analysis of Panel 

5. Age 5 (65- ) 

-0.0171 (0.0155) 

-0.0423 (0.0121) 

Data, Cambridge University 

6. Unemployed previous year 

-0.0042 (0.0153) 

-0.0277(0.0151) 

source !xa! Hausmanf*” 31 

7. Poor health previous year 

-0.0204(0.0221) 

-0.0250 (0.0215) 

“Specification Tests in 

8. Self-employment 

-0.2190 (0.0297) 

-0.2670 (0.0263) 

vol. 46, 1978, pp. 1251-1271. 

9. South 

-0.1569 (0.0656) 

-0.0324 (0.0333) 


10. Rural 

-0.0101 (0.0317) 

-0.1215 (0.0237) 


11. Constant 

— 

0.8499 (0.0433) 


S 2 

0.0567 

0.0694 


Degrees of freedom 

3,135 

3,763 


a. Do the results make economic sense? 

b. Is there a vast difference in the results produced by the two models? If so, what 
might account for these differences? 

c. On the basis of the data given in the table, which model, if any, would you choose? 

Empirical Exercises 

16.10. Refer to the airline example discussed in the text. Instead of the linear model given 

in Eq. (16.4.2), estimate a log-linear regression model and compare your results 

with those given in Table 16.2. 

16.11. Refer to the data in Table 1.1. 

a. Let Y = eggs produced (in millions) and X = price of eggs (cents per dozen). 
Estimate the model for the years 1990 and 1991 separately. 

b. Pool the observations for the two years and estimate the pooled regression. What 
assumptions are you making in pooling the data? 

c. Use the fixed effects model, distinguishing the two years, and present the 
regression results. 

d. Can you use the fixed effects model, distinguishing the 50 states? Why or why not? 

e. Would it make sense to distinguish both the state effect and the year effect? If so, 
how many dummy variables would you have to introduce? 

f Would the error components model be appropriate to model the production of 
eggs? Why or why not? See if you can estimate such a model using, say, EViews. 
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16.12. Continue with Exercise 16.11. Before deciding to run the pooled regression, you 
want to find out whether the data are “poolable.” For this purpose you decide to use 
the Chow test discussed in Chapter 8. Show the necessary calculations involved and 
determine if the pooled regression makes any sense. 

16.13. Use the investment data given in Table 1.6. 

a. Estimate the Grunfeld investment function for each company individually. 

b. Now pool the data for all the companies and estimate the Grunfeld investment 
function by OLS. 

c. Use LSDV to estimate the investment function and compare your results with 
the pooled regression estimated in ( b ). 

d. How would you decide between the pooled regression and the LSDV regression? 
Show the necessary calculations. 

16.14. Table 16.16 gives data on the hourly compensation rate in manufacturing in U.S. 
dollars, Y (%), and the civilian unemployment rate, X (index, 1992 = 100), for 
Canada, the United Kingdom, and the United States for the period 1980-2006. 
Consider the model: 


Y it = p! + p 2 X it + u it 


( 1 ) 


TABLE 16.16 
Unemployment Rate 
and Hourly 
Compensation in 
Manufacturing, in 
the United States, 
Canada, and the 
United Kingdom, 
1980-2006. 

Table B-109. 


Year COMPJJ.S. 


UN U.S. COMP CAN UN_CAN COMPJJ.K. UN_U.K. 


1980 55.9 

1981 61.6 

1982 67.2 

1983 69.3 

1984 71.6 

1985 75.3 

1986 78.8 

1987 81.3 

1988 84.1 

1989 86.6 

1990 90.5 

1991 95.6 

1992 100.0 

1993 102.0 

1994 105.3 

1995 107.3 

1996 109.3 

1997 112.2 

1998 118.7 

1999 123.4 

2000 134.7 

2001 137.8 

2002 147.8 

2003 158.2 

2004 161.5 

2005 168.3 

2006 172.4 


7.1 49.0 

7.6 53.8 

9.7 60.1 

9.6 64.3 

7.5 65.0 

7.2 65.0 

7.0 64.9 

6.2 69.6 

5.5 78.5 

5.3 85.5 

5.6 

92.4 

6.8 100.7 

7.5 100.0 

6.9 94.8 
6.1 

92.1 

5.6 93.9 

5.4 95.9 

4.9 96.7 

4.5 94.9 

4.2 96.8 

4.0 100.0 

4.7 98.9 

5.8 101.0 

6.0 116.7 

5.5 127.1 

5.1 141.8 

4.6 155.5 


7.3 47.1 

7.3 47.5 

10.7 45.1 

11.6 41.9 

10.9 39.8 

10.2 42.3 

9.3 52.0 

8.4 64.5 

7.4 74.8 

7.1 73.5 

7.7 89.6 

9.8 99.9 

10.6 100.0 

10.8 88.8 

9.6 92.8 

8.6 97.3 

8.8 96.0 

8.4 104.1 

7.7 113.8 

7.0 117.5 

6.1 114.8 

6.5 114.7 

7.0 126.8 

6.9 145.2 

6.4 171.4 

6.0 177.4 

5.5 192.3 


6.9 

9.7 
10.8 
11.5 
11.8 
11.4 

11.4 

10.5 
8.6 

7.3 

7.1 

8.9 
10.0 
10.4 

8.7 

8.7 

8.1 
7.0 

6.3 
6.0 
5.5 

5.1 

5.2 
5.0 

4.8 
4.8 
5.5 


otes: UN = Unemployment rate %. 
COMP = Index of hourly compensate 


i U S. dollars, 1992-100. 
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a. A priori, what is the expected relationship between Y and A? Why? 

b. Estimate the model given in Eq. (1) for each country. 

c. Estimate the model, pooling all of the 81 observations. 

d. Estimate the fixed effects model. 

e. Estimate the error components model. 

f. Which is a better model, FEM or ECM? Justify your answer {Hint: Apply the 
Hausman Test). 

16.15. Baltagi and Griffin considered the following gasoline demand function:* 

In Y< j = f}\ + f $2 In X 211 + /J31 n A3,-. + /I4 In X 4s , -f- u : ; 

Where Y = gasoline consumption per car; X 2 = real income per capita, X 3 = real 
gasoline price, X 4 = number of cars per capita, i = country code, in all 18 OECD 
countries, and t = time (annual observations from 1960-1978). Note: Values in 
table are logged already. 

a. Estimate the above demand function pooling the data for all 18 of the countries 
(a total of 342 observations). 

b. Estimate a fixed effects model using the same data. 

c. Estimate a random components model using the same data. 

d. From your analysis, which model best describes the gasoline demand in the 
18 OECD countries? Justify your answer. 

16.16. The article by Subhayu Bandyopadhyay and Howard J. Wall, “The Determinants of 
Aid in the Post-Cold War Era,” Review, Federal Reserve Bank of St. Louis, 
November/December 2007, vol. 89, number 6, pp. 533-547, uses panel data to 
estimate the responsiveness of aid to recipient countries’ economic and physical 
needs, civil/political rights, and government effectiveness. The data are for 
135 countries for three years. The article and data can be found at: http:// 
research.stlouisfed.org/publications/review/past/2007 in the November/December 
Vol. 89, No. 10 section. The data can also be found on the textbook website in 
Table 16.18. Estimate the authors’ model (given on page 534 of their article) using 
a random effects estimator. Compare your results with those of the pooled and fixed 
effects estimators given by the authors in Table 2 of their article. Which model is 
appropriate here, fixed effects or random effects? Why? 

16.17. Refer to the airlines example discussed in the text. For each airline, estimate a time 
series logarithmic cost function. How do these regressions compare with the fixed 
effects and random effects models discussed in the chapter? Would you also esti¬ 
mate 15 cross-section logarithmic cost functions? Why or why not? 


‘B. H. Baltagi and J. M. Griffin, "Gasoline Demand in the OECD: An Application of Pooling and Test¬ 
ing Procedures," European Economic Review, vol. 22, 1983, pp. 117-137. The data for 18 OECD coun¬ 
tries for the years 1960-1978 can be obtained from: http://www.wiley.com/legacy/wileychi/baltagi/ 
supp/Gasoline.dat, or from the textbook website, Table 16.17. 
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17 


Dynamic Econometric 
Models: Autoregressive 
and Distributed-Lag 
Models 


In regression analysis involving time series data, if the regression model includes not only 
the current but also the lagged (past) values of the explanatory variables (the X’s), it is 
called a distributed-lag model. If the model includes one or more lagged values of the 
dependent variable among its explanatory variables, it is called an autoregressive model. 
Thus, 


Y, = a + foX t + pxXt-i + p 2 X t - 2 + u t 


represents a distributed-lag model, whereas 


Y,=u + pX t + yY t _\ + u, 


is an example of an autoregressive model. The latter are also known as dynamic models 
since they portray the time path of the dependent variable in relation to its past value(s). 

Autoregressive and distributed-lag models are used extensively in econometric analysis, 
and in this chapter we take a close look at such models with a view to finding out the 
following: 

1. What is the role of lags in economics? 

2. What are the reasons for the lags? 

3. Is there any theoretical justification for the commonly used lagged models in empirical 
econometrics? 

4. What is the relationship, if any, between autoregressive and distributed-lag models? 
Can one be derived from the other? 

5. What are some of the statistical problems involved in estimating such models? 

6. Does a lead-lag relationship between variables imply causality? If so, how does one 
measure it? 


617 
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17.1 The Role of “Time,” or “Lag,” in Economics 


In economics the dependence of a variable Y (the dependent variable) on another vari¬ 
able^) X (the explanatory variable) is rarely instantaneous. Very often, 7 responds to X 
with a lapse of time. Such a lapse of time is called a lag. To illustrate the nature of the lag, 
we consider several examples. 


EXAMPLE 17.1 Suppose a person receives a salary increase of $2,000 in annual pay, and suppose that this 
The Consumption ' s a "permanent" increase in the sense that the increase in salary is maintained. What will 
Function * 3e t * ie e ^ ect ^' s ' ncrease ' n income on the person's annual consumption expenditure? 

Following such a gain in income, people usually do not rush to spend all the increase 
immediately. Thus, our recipient may decide to increase consumption expenditure by 
$800 in the first year following the salary increase in income, by another $600 in the next 
year, and by another $400 in the following year, saving the remainder. By the end of the 
third year, the person's annual consumption expenditure will be increased by $1,800. We 
can thus write the consumption function as 

Y t = constant + 0.4 X t + 0.3 X t _i + 0.2X t _ 2 + u t ( 17 . 1 . 1 ) 

where 7 is consumption expenditure and X is income. 

Equation (17.1.1) shows that the effect of an increase in income of $2,000 is spread, or 
distributed, over a period of 3 years. Models such as Eq. (17.1.1) are therefore called 
distributed-lag models because the effect of a given cause (income) is spread over a 
number of time periods. Geometrically, the distributed-lag model (17.1.1) is shown in 
Figure 1 7.1, or alternatively, in Figure 1 7.2. 


FIGURE 17.1 

Example of 
distributed lags. 
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More generally we may write 

Y t =a + p 0 X t + piX t _ x + p 2 X t _ 2 + • • • + PkX,- k + u, (17.1.2) 

which is a distributed-lag model with a finite lag of A: time periods. The coefficient Pq is 
known as the short-run, or impact, multiplier because it gives the change in the mean 
value of Y following a unit change in X in the same time period. 1 If the change in X is 
maintained at the same level thereafter, then (/3 0 + Pi) gives the change in (the mean 
value of) Y in the next period, ( Pq + P\ + p 2 ) in the following period, and so on. These 
partial sums are called interim, or intermediate, multipliers. Finally, after A: periods we 
obtain 


!> = £ 


(17.1.3) 


which is known as the long-run, or total, distributed-lag multiplier, provided the sum p 
exists (to be discussed elsewhere). 

If we define 


EA P 


(17.1.4) 


we obtain “standardized” Pi . Partial sums of the standardized P, then give the proportion 
of the long-run, or total, impact felt by a certain time period. 

Returning to the consumption regression (17.1.1), we see that the short-run multiplier, 
which is nothing but the short-run marginal propensity to consume (MPC), is 0.4, whereas 
the long-run multiplier, which is the long-run marginal propensity to consume, is 0.4 + 0.3 + 
0.2 = 0.9. That is, following a $1 increase in income, the consumer will increase his or her 
level of consumption by about 40 cents in the year of increase, by another 30 cents in the 
next year, and by yet another 20 cents in the following year. The long-run impact of an 
increase of $ 1 in income is thus 90 cents. If we divide each p, by 0.9, we obtain, respec¬ 
tively, 0.44, 0.33, and 0.23, which indicate that 44 percent of the total impact of a unit 
change in X on Y is felt immediately, 77 percent after one year, and 100 percent by the end 
of the second year. 


EXAMPLE 17.2 

Creation of Bank 
Money (Demand 
Deposits) 


Suppose the Federal Reserve System pours $1,000 of new money into the banking system 
by buying government securities. What will be the total amount of bank money, or 
demand deposits, that will be generated ultimately? 

Following the fractional reserve system, if we assume that the law requires banks to 
keep a 20 percent reserve backing for the deposits they create, then by the well-known 
multiplier process the total amount of demand deposits that will be generated will 
be equal to $1,000[1/(1 - 0.8)] = $5,000. Of course, $5,000 in demand deposits will 
not be created overnight. The process takes time, which can be shown schematically in 
Figure 17.3. 

( Continued ) 


'Technically, Po is the partial derivative of Y with respect to X t , Pi that with respect to X t -i, Pi that 
with respect to X t _2, and so forth. Symbolically, dY t /dXt~k = Pk■ 
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EXAMPLE 17.2 

0 Continued ) 


FIGURE 17.3 Cumulative expansion in bank deposits (initial reserve $1,000 and 20 percent 
reserve requirement). 



$ 1.000 Stages in expansion 


EXAMPLE 17.3 

Link between 
Money and 
Prices 


According to the monetarists, inflation is essentially a monetary phenomenon in the sense 
that a continuous increase in the general price level is due to the rate of expansion in 
money supply far in excess of the amount of money actually demanded by the economic 
units. Of course, this link between inflation and changes in money supply is not instanta¬ 
neous. Studies have shown that the lag between the two is anywhere from 3 to about 
20 quarters. The results of one such study are shown in Table 17.1, 2 where we see the ef¬ 
fect of a 1 percent change in the Ml B money supply (= currency + checkable deposits at 
financial institutions) is felt over a period of 20 quarters. The long-run impact of a 1 per¬ 
cent change in the money supply on inflation is about 1 (= which is statistically 

significant, whereas the short-run impact is about 0.04, which is not significant, although 
the intermediate multipliers seem to be generally significant. Incidentally, note that since 
P and M are both in percent forms, the m,- (/S, in our usual notation) give the elasticity of 
P with respect to M, that is, the percent response of prices to a 1 percent increase in the 
money supply. Thus, mo = 0.041 means that for a 1 percent increase in the money supply 
the short-run elasticity of prices is about 0.04 percent. The long-term elasticity is 1.03 per¬ 
cent, implying that in the long run a 1 percent increase in the money supply is reflected 
by just about the same percentage increase in the prices. In short, a 1 percent increase 
in the money supply is accompanied in the long run by a 1 percent increase in the infla¬ 
tion rate. 


2 Keith M. Carlson, "The Lag from Money to Prices," Review, Federal Reserve Bank of St. Louis, 
October 1980, Table 1, p. 4. 
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EXAMPLE 17.3 TABLE 17.1 Estimate of Money-Price Equation: Original Specification 
('Continued ) Sample period: 1955-1 to 1969-IV: m 2 % = 0 

20 

P = -0.146 + 

(0.395) 



Coeff. 

It 


Coeff. 

m 


Coeff. 

m 

mo 

0.041 

1.276 


0.048 

3.249 

mi 6 

0.069 

3.943 

mi 

0.034 

1.538 

mg 

0.054 

3.783 

mi 7 

0.062 

3.712 

m 2 

0.030 

1.903 

mio 

0.059 

4.305 

m is 

0.053 

3.511 

m 3 

0.029 

2.171 

mu 

0.065 

4.673 

mi 9 

0.039 

3.338 

m 4 

0.030 

2.235 

mi 2 

0.069 

4.795 

m 20 

0.022 

3.191 

m 5 

0.033 

2.294 

mi 3 

0.072 

4.694 

Em- 

1.031 

7.870 

m 6 

0.037 

2.475 

mi 4 

0.073 

4.468 

Mean lag 

10.959 

5.634 

m 7 

0.042 

2.798 

mi 5 

0.072 

4.202 




R 2 

0.525 se 

1.066 

D.W. 

2.00 






Notation: P = compounded annual rate of change of GNP deflator. 
M = compounded annual rate of change of M1B. 


EXAMPLE 17.4 

Lag between 
R&D 

Expenditure and 
Productivity 


The decision to invest in research and development (R&D) expenditure and its ultimate 
payoff in terms of increased productivity involve considerable lag, actually several lags, 
such as, "... the lag between the investment of funds and the time inventions actually 
begin to appear, the lag between the invention of an idea or device and its development 
up to a commercially applicable stage, and the lag which is introduced by the process of 
diffusion: it takes time before all the old machines are replaced by the better new ones." 3 


EXAMPLE 17.5 

The ] Curve of 

International 

Economics 


Students of international economics are familiar with what is called the ] curve, which 
shows the relationship between trade balance and depreciation of currency. Following 
depreciation of a country's currency (e.g., due to devaluation), initially the trade balance 
deteriorates but eventually it improves, assuming other things are the same. The curve is 
as shown in Figure 17.4. 


FIGURE 17.4 

The J curve. 


Current account 
(in domestic output units) 



Real depreciation takes End of 
place and J curve begins J curve 


3 Zvi Criliches, "Distributed Lags: A Survey," Econometrica, vol. 36, no. 1, January 1967, pp. 16^49. 
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EXAMPLE 17.6 In its simplest form, the acceleration principle of investment theory states that investment 
The Accelerator is proportional to changes in output. Symbolically, 

Model of l t = p(X t - X t _T) p > 0 ( 17 . 1 . 5 ) 

Investment where l t is investment at time t, X t is output at time t, and X t -i is output at time (t — 1). 


The preceding examples are only a sample of the use of lag in economics. Undoubtedly, 
the reader can produce several examples from his or her own experience. 

17.2 The Reasons for Lags * 1 2 3 4 


Although the examples cited in Section 17.1 point out the nature of lagged phenomena, 

they do not fully explain why lags occur. There are three main reasons: 

1. Psychological reasons. As a result of the force of habit (inertia), people do not change 
their consumption habits immediately following a price decrease or an income increase 
perhaps because the process of change may involve some immediate disutility. Thus, 
those who become instant millionaires by winning lotteries may not change the 
lifestyles to which they were accustomed for a long time because they may not know 
how to react to such a windfall gain immediately. Of course, given reasonable time, they 
may learn to live with their newly acquired fortune. Also, people may not know whether 
a change is “permanent” or “transitory.” Thus, my reaction to an increase in my income 
will depend on whether or not the increase is permanent. If it is only a nonrecurring 
increase and in succeeding periods my income returns to its previous level, I may save 
the entire increase, whereas someone else in my position might decide to “live it up.” 

2. Technological reasons. Suppose the price of capital relative to labor declines, making 
substitution of capital for labor economically feasible. Of course, addition of capital 
takes time (the gestation period). Moreover, if the drop in price is expected to be tempo¬ 
rary, firms may not rush to substitute capital for labor, especially if they expect that after 
the temporary drop the price of capital may increase beyond its previous level. Some¬ 
times, imperfect knowledge also accounts for lags. At present the market for personal 
computers is glutted with all kinds of computers with varying features and prices. More¬ 
over, since their introduction in the late 1970s, the prices of most personal computers 
have dropped dramatically. As a result, prospective consumers for the personal computer 
may hesitate to buy until they have had time to look into the features and prices of all the 
competing brands. Moreover, they may hesitate to buy in the expectation of further 
decline in price or innovations. 

3. Institutional reasons. These reasons also contribute to lags. For example, contractual 
obligations may prevent firms from switching from one source of labor or raw material to 
another. As another example, those who have placed funds in long-term savings accounts 
for fixed durations such as one year, three years, or seven years are essentially “locked in” 
even though money market conditions may be such that higher yields are available else¬ 
where. Similarly, employers often give their employees a choice among several health 
insurance plans, but once a choice is made, an employee may not switch to another plan 
for at least one year. Although this may be done for administrative convenience, the 
employee is locked in for one year. 

4 This section leans heavily on Marc Nerlove, Distributed Lags and Demand Analysis for Agricultural and 

Other Commodities, Agricultural Handbook No. 141, U.S. Department of Agriculture, June 1958. 
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For the reasons just discussed, lag occupies a central role in economics. This is clearly 
reflected in the short-run-long-run methodology of economics. It is for this reason we say 
that short-run price or income elasticities are generally smaller (in absolute value) than the 
corresponding long-run elasticities or that short-run marginal propensity to consume is 
generally smaller than long-run marginal propensity to consume. 

17.3 Estimation of Distributed-Lag Models 

Granted that distributed-lag models play a highly useful role in economics, how does one 
estimate such models? Specifically, suppose we have the following distributed-lag model in 
one explanatory variable: 5 

Y, = a + p 0 X t + piX t _ x + p 2 X t — 2 + .-• + «, ( 17 . 3 . 1 ) 

where we have not defined the length of the lag, that is, how far back into the past we want 
to go. Such a model is called an infinite (lag) model, whereas a model of the type shown 
in Eq. (17.1.2) is called a finite (lag) distributed-lag model because the length of the lag 
k is specified. We shall continue to use Eq. (17.3.1) because it is easy to handle mathemat¬ 
ically, as we shall see. 6 

How do we estimate the a and/l’s ofEq. (17.3.1)? We may adopt two approaches: (1) ad 
hoc estimation and (2) a priori restrictions on the P’s by assuming that the P’s follow some 
systematic pattern. We shall consider ad hoc estimation in this section and the other 
approach in Section 17.4. 

Ad Hoc Estimation of Distributed-Lag Models 

Since the explanatory variable X t is assumed to be nonstochastic (or at least uncorrelated 
with the disturbance term u t ), X t -\, X ; _ 2 , and so on, are nonstochastic, too. Therefore, in 
principle, the ordinary least squares (OLS) can be applied to Eq. (17.3.1). This is the ap¬ 
proach taken by Alt 7 and Tinbergen. 8 They suggest that to estimate Eq. (17.3.1) one may 
proceed sequentially; that is, first regress Y, on X t , then regress Y, on X t and X t _\, then 
regress Y t on X,, X t _\, and X t _ 2 , and so on. This sequential procedure stops when the 
regression coefficients of the lagged variables start becoming statistically insignificant 
and/or the coefficient of at least one of the variables changes signs from positive to negative 
or vice versa. Following this precept, Alt regressed fuel oil consumption Y on new orders X. 
Based on the quarterly data for the period 1930-1939, the results were as follows: 

ft = 8.37 + 0.171A) 

Y t = 8.27 + O.llLT, + 0.064X f _i 

f t = 8.27 + 0.109A, + 0.071 X t _ x - 0.055X ( _ 2 

% = 8.32 + 0.108X, + 0.063A;_i + 0.022X,_ 2 - 0.020X,_ 3 


5 lf there is more than one explanatory variable in the model, each variable may have a lagged effect 
on Y. For simplicity only, we assume one explanatory variable. 

6 ln practice, however, the coefficients of the distant X values are expected to have a negligible effect 
on Y. 

7 F. F. Alt, "Distributed Lags," Econometrica, vol. 10, 1942, pp. 113-128. 

8 J. Tinbergen, "Long-Term Foreign Trade Elasticities," Metroeconomica, vol. 1, 1949, pp. 174-185. 



624 Part Three Topics in Econometrics 


Alt chose the second regression as the “best” one because in the last two equations the sign 
of X t _j was not stable and in the last equation the sign of X t _j, was negative, which may be 
difficult to interpret economically. 

Although seemingly straightforward, ad hoc estimation suffers from many drawbacks, 
such as the following: 

1. There is no a priori guide as to what is the maximum length of the lag. 9 

2. As one estimates successive lags, there are fewer degrees of freedom left, making sta¬ 
tistical inference somewhat shaky. Economists are not usually that lucky to have a long 
series of data so that they can go on estimating numerous lags. 

3. More importantly, in economic time series data, successive values (lags) tend to be highly 
correlated; hence multicollinearity rears its ugly head. As noted in Chapter 10, multi- 
collinearity leads to imprecise estimation; that is, the standard errors tend to be large 
in relation to the estimated coefficients. As a result, based on the routinely computed t 
ratios, we may tend to declare (erroneously), that a lagged coefficient(s) is statistically 
insignificant. 

4. The sequential search for the lag length opens the researcher to the charge of data mining. 
Also, as we noted in Section 13.4, the nominal and true level of significance to test 
statistical hypotheses becomes an important issue in such sequential searches (see 
Eq. [13.4.2]). 

In view of the preceding problems, the ad hoc estimation procedure has very little to rec¬ 
ommend it. Clearly, some prior or theoretical considerations must be brought to bear upon 
the various /Ts if we are to make headway with the estimation problem. 


17.4 The Koyck Approach to Distributed-Lag Models 


Koyck has proposed an ingenious method of estimating distributed-lag models. Suppose we 
start with the infinite lag distributed-lag model (17.3.1). Assuming that the p’s are all of the 
same sign, Koyck assumes that they decline geometrically as follows. 10 


p k = p 0 x k k = 0 , 1 ,... 


(17.4.1) 11 


where X, such that 0 < X < 1, is known as the rate of decline, or decay, of the distributed 
lag and where 1 — A is known as the speed of adjustment. 

What Eq. (17.4.1) postulates is that each successive P coefficient is numerically less 
than each preceding p (this statement follows since X < 1), implying that as one goes back 
into the distant past, the effect of that lag on Y t becomes progressively smaller, a quite plau¬ 
sible assumption. After all, current and recent past incomes are expected to affect current 
consumption expenditure more heavily than income in the distant past. Geometrically, the 
Koyck scheme is depicted in Figure 17.5. 

As this figure shows, the value of the lag coefficient p k depends, apart from the common 
Po, on the value of X. The closer A. is to 1, the slower the rate of decline in p k , whereas the 


9 lf the lag length, k, is incorrectly specified, we will have to contend with the problem of misspecifica- 
tion errors discussed in Chapter. 1 3. Also keep in mind the warning about data mining. 

10 L. M. Koyck, Distributed Lags and Investment Analysis, North Holland Publishing Company, 
Amsterdam, 1954. 

"Sometimes this is also written as 

Pk = P o(1 - A)A. fc * = 0,1,... 


for reasons given in footnote 12. 
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FIGURE 17.5 

Koyck scheme 
(declining geometric 
distribution). 


4 



closer it is to zero, the more rapid the decline in fik- In the former case, distant past values 
of X will exert sizable impact on Y,, whereas in the latter case their influence on Y, will 
peter out quickly. This pattern can be seen clearly from the following illustration: 


A 

Po 

Pi 

P 2 

Pi 

Pa 

Ps 

Pio 

0.75 

Po 

0.75Po 

0.56/6o 

0.42/6 0 

0.32/3 0 

0.24/6 0 

0.06/So 

0.25 

Po 

0.25/60 

O.O6/60 

0.02/6 0 

0.004/6o 

0.001 Po ■■ 

0.0 


Note these features of the Koyck scheme: (1) By assuming nonnegative values for A, 
Koyck rules out the P’s from changing sign; (2) by assuming A < 1, he gives lesser weight 
to the distant /Ts than the current ones; and (3) he ensures that the sum of the yd’s, which 
gives the long-run multiplier, is finite, namely, 


S*-*(r=i) <,7A2) ' 2 

As a result of Eq. (17.4.1), the infinite lag model (17.3.1) may be written as 

Y, = a + p 0 X, + A)AX,_i + AjA 2 X,_ 2 + ••• + «, (17.4.3) 

As it stands, the model is still not amenable to easy estimation since a large (literally infi¬ 
nite) number of parameters remain to be estimated and the parameter A enters in a highly 


12 This is because 


X] 4 = PoO + A + A 2 + A 3 h—) = po ^ 

since the expression in the parentheses on the right side is an infinite geometric series whose sum is 
1 /(I — /.) provided 0 < A < 1. In passing, note that if fSk is as defined in footnote 11, 

Y J Pk = PoO - A)/(1 - A) = /So, thus ensuring that the weights (1 - X)X k sum to 1. 
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nonlinear form: Strictly speaking, the method of linear (in the parameters) regression 
analysis cannot be applied to such a model. But now Koyck suggests an ingenious way out. 
He lags Eq. (17.4.3) by one period to obtain 

7 t -i = a + PoXt -1 + p 0 XX,_ 2 + p 0 X 2 X t _ 3 + ■ • • + Mr- 1 (17.4.4) 

He then multiplies Eq. (17.4.4) by X to obtain 

XY t _ x = Xa + Xfi 0 X, i + p 0 X 2 X,_ 2 + M 3 ^r-3 + • • • + Xu t - X (17.4.5) 
Subtracting Eq. (17.4.5) from Eq. (17.4.3), he gets 

Y t - XY ,_i =a(\ -X) + p 0 X, + (u, - Xu t -i) (17.4.6) 

or, rearranging, 

Y t = a(l - X) + 0oX t + XY t -i +v, (17.4.7) 


where v t = (u t - Xu t - 1), a moving average of u t and u t -\. 

The procedure just described is known as the Koyck transformation. Comparing 
Eq. (17.4.7) with Eq. (17.3.1), we see the tremendous simplification accomplished by 
Koyck. Whereas before we had to estimate a and an infinite number of P’s, we now 
have to estimate only three unknowns: a, Po, and X. Now there is no reason to expect 
multicollinearity. In a sense, multicollinearity is resolved by replacing X t _\, X t _ 2 , ■ ■ ■, 
by a single variable, namely, Y t _\. But note the following features of the Koyck 
transformation: 

1. We started with a distributed-lag model but ended up with an autoregressive model 
because F,_i appears as one of the explanatory variables. This transformation shows 
how one can “convert” a distributed-lag model into an autoregressive model. 

2. The appearance of F_i is likely to create some statistical problems. Y t _\, like Y t , is sto¬ 
chastic, which means that we have a stochastic explanatory variable in the model. Recall 
that the classical least-squares theory is predicated on the assumption that the explana¬ 
tory variables either are nonstochastic or, if stochastic, are distributed independently 
of the stochastic disturbance term. Hence, we must find out if Y t -\ satisfies this 
assumption. (We shall return to this point in Section 17.8.) 

3. In the original model (17.3.1) the disturbance term was u t , whereas in the transformed 
model it is v t = (u, — Xu t _\ ). The statistical properties of v, depend on what is assumed 
about the statistical properties of u t , for, as shown later, if the original u t ’s are serially 
uncorrelated, the v, ’s are serially correlated. Therefore, we may have to face up to the se¬ 
rial correlation problem in addition to the stochastic explanatory variable Y,_ \. We shall 
do that in Section 17.8. 

4. The presence of lagged Y violates one of the assumptions underlying the Durbin- 
Watson d test. Therefore, we will have to develop an alternative to test for serial corre¬ 
lation in the presence of lagged Y. One alternative is the Durbin h test, which is 
discussed in Section 17.10. 

As we saw in Eq. (17.1.4), the partial sums of the standardized Pi tell us the proportion 
of the long-run, or total, impact felt by a certain time period. In practice, though, the mean 
or median lag is often used to characterize the nature of the lag structure of a distributed- 
lag model. 
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The Median Lag 

The median lag is the time required for the first half, or 50 percent, of the total change in Y 
following a unit sustained change in X. For the Koyck model, the median lag is as follows 
(see Exercise 17.6): 


Koyck model: Median lag = - )° g 2 , 17 

log A. 

Thus, if A = 0.2 the median lag is 0.4306, but if A = 0.8 the median lag is 3.1067. Verbally, 
in the former case 50 percent of the total change in 7 is accomplished in less than half a pe¬ 
riod, whereas in the latter case it takes more than 3 periods to accomplish the 50 percent 
change. But this contrast should not be surprising, for as we know, the higher the value of 
A the lower the speed of adjustment, and the lower the value of A the greater the speed of 
adjustment. 


The Mean Lag 

Provided all ft are positive, the mean, or average, lag is defined as 

V/f i X)o ^ft 
Mean lag = ^ - 

2^o ft 


(17.4.9) 


which is simply the weighted average of all the lags involved, with the respective fi coef¬ 
ficients serving as weights. In short, it is a lag-weighted average of time. For the Koyck 
model the mean lag is (see Exercise 17.7) 


Koyck model: Mean lag = - 


(17.4.10) 


Thus, if A = ~, the mean lag is 1. 

From the preceding discussion it is clear that the median and mean lags serve as a sum¬ 
mary measure of the speed with which Y responds to X. In the example given in Table 17.1 
the mean lag is about 11 quarters, showing that it takes quite some time, on the average, for 
the effect of changes in the money supply to be felt on price changes. 


EXAMPLE 17.7 

Per Capita 
Personal 
Consumption 
Expenditure 
(PPCE) and Per 
Capita 
Disposable 
Income (PPDI) 


This example examines PPCE in relation to PPDI, both expressed in 2000 dollars, for the 
United States for the period 1959-2006. As an illustration of the Koyck model, consider 
the data given in Table 17.2. Regression of PPCE on PPDI and lagged PPCE gives the 
results shown in Table 17.3. 

The consumption function in this table can be called the short-run consumption func¬ 
tion. We will derive the long-run consumption function shortly. 

Using the estimated value of A, we can compute the distributed lag coefficients. If ft « 
0.2139, ft = (0.2139)(0.7971) » 0.1 704, ft = (0.21 39)(0.7971 ) 2 « 0.0231, and so on, 
which are short- and medium-term multipliers. Finally, using Eq. (1 7.4.2), we can obtain 
the long-run multiplier, that is, the total impact of change in income on consumption after 
all lagged effects are taken into account, which in the present example becomes 

|> - fc( T A I ) - « 1.0537 


( Continued ) 
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EXAMPLE 17.7 

(' Continued) 


TABLE 17.3 


TABLE 1 7.2 PPCE and PPDI, 1959-2006 


Year 

PPCE 


PPDI 

Year 

PPCE 

PPDI 

1959 

8,776 


9,685 

1983 

15,656 

1 7,828 

1960 

8,873 


9,735 

1984 

16,343 

19,011 

1961 

8,873 


9,901 

1985 

1 7,040 

19,476 

1962 

9,170 


10,227 

1986 

17,570 

19,906 

1963 

9,412 


10,455 

1987 

1 7,994 

20,072 

1964 

9,839 


11,061 

1988 

18,554 

20,740 

1965 

10,331 


11,594 

1989 

18,898 

21,120 

1966 

10,793 


12,065 

1990 

19,067 

21,281 

1967 

10,994 


12,457 

1991 

18,848 

21,109 

1968 

11,510 


12,892 

1992 

19,208 

21,548 

1969 

11,820 


13,163 

1993 

19,593 

21,493 

1970 

11,955 


13,563 

1994 

20,082 

21,812 

1971 

12,256 


14,001 

1995 

20,382 

22,153 

1972 

12,868 


14,512 

1996 

20,835 

22,546 

1973 

13,371 


15,345 

1997 

21,365 

23,065 

1974 

13,148 


15,094 

1998 

22,183 

24,131 

1975 

13,320 


15,291 

1999 

23,050 

24,564 

1976 

13,919 


15,738 

2000 

23,860 

25,469 

1977 

14,364 


16,128 

2001 

24,205 

25,687 

1978 

14,837 


16,704 

2002 

24,612 

26,21 7 

1979 

15,030 


16,931 

2003 

25,043 

26,535 

1980 

14,816 


16,940 

2004 

25,711 

27,232 

1981 

14,879 


17,217 

2005 

26,277 

27,436 

1982 

14,944 


17,418 

2006 

26,828 

28,005 

Notes- PPCE 




in chained 2000 dollars 



PPDI = p 

>er capita personal c 

lisposable income in chai 

ned 2000 dollars. 




Report of the Presidt 


)07, Table B-31. 




Dependent 

Variable: 

PPCE 




Method: Least Squares 





Sample (adjusted): 

1960-2006 




Included 

observations: 

47 aftefi 

adjustments 




Coefficient Std 

. Error t Statist!.© 

Prob. 

C 

-252.9190 

m 

.3517 -1 

.607348 

0.115|, 

PPDI 

0.213890 0 

.07061|L 3 

.028892 

0.0041 

PPCE(-1) 

0.197146 0 

.073308 If 

.87389 

0.0008 

R-squarec 


0.998216 

Mean dependent var. 

16691.28 

Adjusted 

R-squared 


0.998134 

SLD. dependent var. 

5205»873 

.g.E. of regression 


224.8504 

Akaike info criterion 

'.3.73045 

Sum squared resid. 


2224539. 

Schwarz criterion 

13.84854 

Log likel 

ihood 


•319.6656 

Hannan-Quinn 

criter. 

13.77489 

F-statistic 


12306.99 

Durbin-Watson. 

shat. 

0.96'. 921. 

Pirofc. (F- 

statisfelc) 


0.000000 

Dufteift h = 3 . 

8269* 



‘The calculatic 


liscussed in Sectic 


MO. 
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EXAMPLE 17.7 

( Continued ) 


In words, a sustained increase of a dollar in PPDI will eventually lead to about 1.05 dollars 
increase in PPCE, the immediate, or short-run impact being only 21 cents. 

The long-run consumption function can now be written as: 


PPCEf=-1247.1351 + 1.0537PPDI, 


This is obtained by dividing the short-run consumption function given in Table 17.3 by 
0.2029 on both sides and dropping the lagged PPDI term. 13 

In the long run the marginal propensity to consume (MPC) is about 1. This means that 
when consumers have had time to adjust to a dollar's increase in PPDI, they will increase 
their PPCE by almost a dollar. In the short run, however, as Table 17.3 shows, the MPC is 
only about 21 cents. What is the reason for such a difference between the short- and long- 
run MPC? 

The answer can be found in the median and mean lags. Given X m 0.7971, the median 
lag is 


iog(2) 

log X 


log (2) 

log(0.7971) 


= 3.0589 


and the mean lag is 

—^— = 3.9285 
1 — X 

It seems real PPCE adjusts to real PPDI with a substantial lag: Recall that the larger the 
value of X (between 0 and 1), the longer it takes for the full impact of a change in the value 
of the explanatory variable to be felt on the dependent variable. 


17.5 Rationalization of the Koyck Model: 

The Adaptive Expectations Model 

Although very neat, the Koyck model (17.4.7) is ad hoc since it was obtained by a purely 
algebraic process; it is devoid of any theoretical underpinning. But this gap can be filled if 
we start from a different perspective. Suppose we postulate the following model: 

Y t = po + PiX* t + u, ( 17 . 5 . 1 ) 

where Y — demand for money (real cash balances) 

X* — equilibrium, optimum, expected long-run or normal rate of 
interest 
u — error term 

Equation (17.5.1) postulates that the demand for money is a function of expected (i.e., an¬ 
ticipated) rate of interest. 

Since the expectational variable X* is not directly observable, let us propose the follow¬ 
ing hypothesis about how expectations are formed: 

X* t -X* t _ x = y(X t - x;_i) ( 17 . 5 . 2) 14 


13 ln equilibrium all PPCE values will be the same. Therefore, PPCE t = PPCE t _i. Making this substitu¬ 
tion, you should get the long-run consumption function. 

14 Sometimes the model is expressed as 

=y(X t _ 1 -X*_,) 
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where y, such that 0 < y < 1, is known as the coefficient of expectation. Hypothesis 
(17.5.2) is known as the adaptive expectation, progressive expectation, or error learning 
hypothesis, popularized by Cagan 15 and Friedman. 16 

What Eq. (17.5.2) implies is that “economic agents will adapt their expectations in the 
light of past experience and that in particular they will learn from their mistakes.” 17 More 
specifically, Eq. (17.5.2) states that expectations are revised each period by a fraction y of 
the gap between the current value of the variable and its previous expected value. Thus, for 
our model this would mean that expectations about interest rates are revised each period by 
a fraction y of the discrepancy between the rate of interest observed in the current period 
and what its anticipated value had been in the previous period. Another way of stating this 
would be to write Eq. (17.5.2) as 

x? = Y x t + (i - y)r;_ x (i 7 . 5 . 3 ) 


which shows that the expected value of the rate of interest at time t is a weighted average of 
the actual value of the interest rate at time t and its value expected in the previous period, 
with weights of y and 1 — y, respectively. If y = 1, X* = X, , meaning that expectations are 
realized immediately and fully, that is, in the same time period. If, on the other hand, y = 0, 
X* = X*_ x , meaning that expectations are static, that is, “conditions prevailing today will 
be maintained in all subsequent periods. Expected future values then become identified 
with current values.” 18 

Substituting Eq. (17.5.3) into Eq. (17.5.1), we obtain 

Y t = P o + PilyX, + (1 - y)X* t _ x ] + u, 

^ (17.5.4) 

= Po + PiyX t +p l (\-y)X;_ 1 +u t 


Now lag Eq. (17.5.1) one period, multiply it by 1 — y, and subtract the product from 
Eq. (17.5.4). After simple algebraic manipulations, we obtain 

Yt = yPo + rPiXt + (1 - y)T)_i + u t - (1 - y)u,-\ (i 7 5 5) 

= yPo + yPiX t + (l — y)Y t -\ + v t 
where v t = u t — (1 — y)u t -\. 

Before proceeding any further, let us note the difference between Eq. (17.5.1) and 
Eq. (17.5.5). In the former, P\ measures the average response of F to a unit change in X*, 
the equilibrium or long-run value ofX. In Eq. (17.5.5), on the other hand, yP\ measures the 
average response of Y to a unit change in the actual or observed value of X. These responses 
will not be the same unless, of course, y— 1, that is, the current and long-run values of X 
are the same. In practice, we first estimate Eq. (17.5.5). Once an estimate of y is obtained 
from the coefficient of lagged F, we can easily compute P\ by simply dividing the coeffi¬ 
cient of X ( = yPi) by y. 


15 P. Cagan, "The Monetary Dynamics of Hyperinflations," in M. Friedman (ed.). Studies in the Quan¬ 
tity Theory of Money, University of Chicago Press, Chicago, 1956. 

16 Milton Friedman, A Theory of the Consumption Function, National Bureau of Economic Research, 
Princeton University Press, Princeton, NJ, 1957. 

17 G. K. Shaw, Rational Expectations: An Elementary Exposition, St. Martin's Press, New York, 1984, 
p. 25. 

18 lbid., pp. 19-20. 
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The similarity between the adaptive expectations model (17.5.5) and the Koyck model 
(17.4.7) should be readily apparent although the interpretations of the coefficients in the 
two models are different. Note that like the Koyck model, the adaptive expectations model 
is autoregressive and its error term is similar to the Koyck error term. We shall return to the 
estimation of the adaptive expectations model in Section 17.8 and to some examples in 
Section 17.12. Now that we have sketched the adaptive expectations (AE) model, how re¬ 
alistic is it? It is true that it is more appealing than the purely algebraic Koyck approach, but 
is the AE hypothesis reasonable? In favor of the AE hypothesis one can say the following: 

It provides a fairly simple means of modelling expectations in economic theory whilst postu¬ 
lating a mode of behaviour upon the part of economic agents which seems eminently sensible. 

The belief that people learn from experience is obviously a more sensible starting point than 
the implicit assumption that they are totally devoid of memory, characteristic of static expec¬ 
tations thesis. Moreover, the assertion that more distant experiences exert a lesser effect than 
more recent experience would accord with common sense and would appear to be amply con¬ 
firmed by simple observation. 19 

Until the advent of the rational expectations (RE) hypothesis, initially put forward by 
J. Muth and later propagated by Robert Lucas and Thomas Sargent, the AE hypothesis was 
quite popular in empirical economics. The proponents of the RE hypothesis contend that 
the AE hypothesis is inadequate because it relies solely on the past values of a variable in 
formulating expectations, 20 whereas the RE hypothesis assumes that “individual economic 
agents use current available and relevant information in forming their expectations and do 
not rely purely upon past experience.” 21 In short, the RE hypothesis contends that “expec¬ 
tations are ‘rational’ in the sense that they efficiently incorporate all information available 
at the time the expectation is formulated” 22 and not just the past information. 

The criticism directed by the RE proponents against the AE hypothesis is well-taken, 
although there are many critics of the RE hypothesis itself. 23 This is not the place to get 
bogged down with this rather heady material. Perhaps one could agree with Stephen 
McNees that, “At best, the adaptive expectations assumption can be defended only as a 
‘working hypothesis’ proxying for a more complex, perhaps changing expectations formu¬ 
lation mechanism.” 24 

EXAMPLE 17.8 

Example 17.7 
Revisited 

Since the Koyck transformation underlies the adaptive expectations model, the results 
presented in Table 17.3 can also be interpreted in terms of Equation (17.5.5). Thus y Po = 
-252.9190; y fa = 0.21389, and (1 - y) = 0.797146. So the expectation coefficient 
y ~ 0.2028, and, following the preceding discussion about the AE model, we can say that 
about 20 percent of the discrepancy between actual and expected PPDI is eliminated 
within a year. 


19 lbid., p. 27. 

20 Like the Koyck model, it can be shown that, under AE, expectations of a variable are an exponen¬ 
tially weighted average of past values of that variable. 

21 G. K. Shaw, op. cit., p. 47. For additional details of the RE hypothesis, see Steven M. Sheffrin, Ratio¬ 
nal Expectations, Cambridge University Press, New York, 1983. 

22 Stephen K. McNees, "The Phillips Curve: Forward- or Backward-Looking?" New England Economic 
Review, July-August 1979, p. 50. 

23 For a recent critical appraisal of the RE hypothesis, see Michael C. Lovell, "Test of the Rational 
Expectations Hypothesis," American Economic Review, March 1966, pp. 110-124. 

24 Stephen K. McNees, op. cit., p. 50. 
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17.6 Another Rationalization of the Koyck Model: 

The Stock Adjustment, or Partial Adjustment, Model 

The adaptive expectations model is one way of rationalizing the Koyck model. Another 
rationalization is provided by Marc Nerlove in the so-called stock adjustment or partial 
adjustment model (PAM). 25 To illustrate this model, consider the flexible accelerator 
model of economic theory, which assumes that there is an equilibrium, optimal, desired, or 
long-run amount of capital stock needed to produce a given output under the given state of 
technology, rate of interest, etc. For simplicity assume that this desired level of capital Y* 
is a linear function of output X as follows: 

Y* = p 0 + PiX, + u, (17.6.1) 

Since the desired level of capital is not directly observable, Nerlove postulates the follow¬ 
ing hypothesis, known as the partial adjustment, or stock adjustment, hypothesis: 

Y, - Y t _\ = 8{Y* - Y t _i | (17.6.2) 26 

where <5, such that 0 < 5 < 1, is known as the coefficient of adjustment and where 
Y, - Yf —! = actual change and ( Y* - F,_i) = desired change. 

Since Y, — Y t _\, the change in capital stock between two periods, is nothing but invest¬ 
ment, Eq. (17.6.2) can alternatively be written as 

/, = S(Y,* - y,_0 (17.6.3) 

where I t — investment in time period t. 

Equation (17.6.2) postulates that the actual change in capital stock (investment) in any 
given time period t is some fraction <5 of the desired change for that period. If S — 1, it 
means that the actual stock of capital is equal to the desired stock; that is, actual stock ad¬ 
justs to the desired stock instantaneously (in the same time period). However, if 5 = 0, it 
means that nothing changes since actual stock at time t is the same as that observed in the 
previous time period. Typically, <5 is expected to lie between these extremes since adjust¬ 
ment to the desired stock of capital is likely to be incomplete because of rigidity, inertia, 
contractual obligations, etc.—hence the name partial adjustment model. Note that the ad¬ 
justment mechanism (17.6.2) alternatively can be written as 

Y, = 8Y* + (1 - 8)Y t _ i (17.6.4) 

showing that the observed capital stock at time t is a weighted average of the desired capi¬ 
tal stock at that time and the capital stock existing in the previous time period, 8 and (1 — 5) 
being the weights. Now substitution of Eq. (17.6.1) into Eq. (17.6.4) gives 

Y t = 5(A) + PxX, +«,) + (1 - S)Y, , 7 

= Sp 0 + SPiX, + (1 - 8)Y ,_i + 8u, 


25 Marc Nerlove, Distributed Lags and Demand Analysis for Agricultural and Other Commodities, op. cit. 
26 Some authors do not add the stochastic disturbance term u t to the relation (17.6.1) but add it to 
this relation, believing that if the former is truly an equilibrium relation, there is no scope for the error 
term, whereas the adjustment mechanism can be imperfect and may require the disturbance term. In 
passing, note that Eq. (17.6.2) is sometimes also written as 

ft - ft-! =^(V ' t *_ 1 - T t -l) 
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FIGURE 17.6 

The gradual 
adjustment of the 
capital stock. 



This model is called the partial adjustment model (PAM). 

Since Eq. (17.6.1) represents the long-run, or equilibrium, demand for capital stock, 
Eq. (17.6.5) can be called the short-run demand function for capital stock since in the short 
run the existing capital stock may not necessarily be equal to its long-run level. Once we es¬ 
timate the short-run function (17.6.5) and obtain the estimate of the adjustment coefficient 
<5 (from the coefficient of Y t _{), we can easily derive the long-run function by simply divid¬ 
ing Sfio and 8/3\ by <5 and omitting the lagged Y term, which will then give Eq. (17.6.1). 

Geometrically, the partial adjustment model can be shown as in Figure 17.6. 27 In this 
figure Y* is the desired capital stock and Y\ the current actual capital stock. For illustrative 
purposes assume that 5 = 0.5. This implies that the firm plans to close half the gap between 
the actual and the desired stock of capital each period. Thus, in the first period it moves to 
Y 2 , with investment equal to (Y 2 — Y\), which in turn is equal to half of (Y* — Y\). In each 
subsequent period it closes half the gap between the capital stock at the beginning of the pe¬ 
riod and the desired capital stock Y*. 

The partial adjustment model resembles both the Koyck and adaptive expectations mod¬ 
els in that it is autoregressive. But it has a much simpler disturbance term: the original dis¬ 
turbance term u, multiplied by a constant 8. But bear in mind that although similar in 
appearance, the adaptive expectations and partial adjustment models are conceptually very 
different. The former is based on uncertainty (about the future course of prices, interest rates, 
etc.), whereas the latter is due to technical or institutional rigidities, inertia, cost of change, 
etc. However, both of these models are theoretically much sounder than the Koyck model. 

Since in appearance the adaptive expectations and partial adjustment models are indis¬ 
tinguishable, the y coefficient of 0.2028 of the adaptive expectations model can also be in¬ 
terpreted as the 5 coefficient of the stock adjustment model if we assume that the latter 
model is operative in the present case (i.e., it is the desired or expected PPCE that is linearly 
related to the current PDPI). 

The important point to keep in mind is that since Koyck, adaptive expectations, and 
stock adjustment models—apart from the difference in the appearance of the error term— 
yield the same final estimating model, a researcher must be extremely careful in telling the 
reader which model he or she is using and why. Thus, researchers must specify the theoret¬ 
ical underpinning of their model. 

27 This is adapted from Figure 7.4 from Rudiger Dornbusch and Stanley Fischer, Macroeconomics, 3d 
ed., McGraw-Hill, New York, 1984, p. 216. 
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*17.7 Combination of Adaptive Expectations 
and Partial Adjustment Models 

Consider the following model: 

Y* = A) + P\X* t + u t (17.7.1) 

where Y* = desired stock of capital and X* — expected level of output. 

Since neither Y* nor X* are directly observable, one could use the partial adjustment 
mechanism for Y* and the adaptive expectations model for X* to arrive at the following es¬ 
timating equation (see Exercise 17.2): 


Y, = faSy + PiSyX, + [(1 - y) + (1 - 5)]7,_i 

- (1 - 5)(1 - y)Y t -2 + [Su t - 5(1 - y)u t ^] (17.7.2) 

= a 0 + ot\X t + ot 2 Yt—i + Y t —2 + v, 


where v t = <5[«, — (1 — y)u t - 1]■ This model too is autoregressive, the only difference 
from the purely adaptive expectations model being that F r _2 appears along with Y t - \ as an 
explanatory variable. Like Koyck and the AE models, the error term in Eq. (17.7.2) follows 
a moving average process. Another feature of this model is that although the model is lin¬ 
ear in the a’s, it is nonlinear in the original parameters. 

A celebrated application of Eq. (17.7.1) has been Friedman’s permanent income 
hypothesis, which states that “permanent” or long-run consumption is a function of 
“permanent” or long-run income. 28 

The estimation of Eq. (17.7.2) presents the same estimation problems as the Koyck or 
the AE model in that all these models are autoregressive with similar error structures. In ad¬ 
dition, Eq. (17.7.2) involves some nonlinear estimation problems that we consider briefly 
in Exercise 17.10, but do not delve into in this book. 


17.8 


Estimation of Autoregressive Models 

From our discussion thus far we have the following three models: 

Koyck 

Y t — a(l — A.) + Po Xt + ATf_i + v t 

Adaptive expectations 

Y, = ypo + ypiX, + (1 - y)Y t _\ + [u t - (1 - y)u t ^] 
Partial adjustment 


Y t = Sp 0 + SPiX, + (1 - 5)7 ( _! + Su t 


(17.4.7) 

(17.5.5) 

(17.6.5) 


‘Optional. 

28 Milton Friedman, A Theory of Consumption Function, Princeton University Press, Princeton, N.J., 
1957. 
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All these models have the following common form: 

Y l = ao + a l X t + a 2 Y,-i + v t (17.8.1) 

that is, they are all autoregressive in nature. Therefore, we must now look at the estimation 
problem of such models, because the classical least-squares theory may not be directly 
applicable to them. The reason is twofold: the presence of stochastic explanatory 
variables and the possibility of serial correlation. 

Now, as noted previously, for the application of the classical least-squares theory, it must 
be shown that the stochastic explanatory variable Y t -\ is distributed independently of the 
disturbance term v,. To determine whether this is so, it is essential to know the properties of 
v t . If we assume that the original disturbance term u, satisfies all the classical assumptions, 
such as E(ut) — 0, var (u,) — a 2 (the assumption of homoscedasticity), and cov (u t , u t+s ) — 
0 for s / 0 (the assumption of no autocorrelation), v, may not inherit all these properties. 
Consider, for example, the error term in the Koyck model, which is v, = (u, — Xu,- i). 
Given the assumptions about u t , we can easily show that v, is serially correlated because 

E(y t v t i) = -Act 2 (17.8.2) 29 

which is nonzero (unless X happens to be zero). And since 7,_i appears in the Koyck model 
as an explanatory variable, it is bound to be correlated with v t (via the presence of u t -\ in it). 
As a matter of fact, it can be shown that 

cov [Y t -u (u, - Xu,- 1)] = -Xa 2 (17.8.3) 

which is the same as Eq. (17.8.2). The reader can verify that the same holds true of the 
adaptive expectations model. 

What is the implication of the finding that in the Koyck model as well as the adaptive 
expectations model the stochastic explanatory variable Y t _\ is correlated with the error 
term v t ? As noted previously, if an explanatory variable in a regression model is corre¬ 
lated with the stochastic disturbance term, the OLS estimators are not only biased but 
also not even consistent; that is, even if the sample size is increased indefinitely, the es¬ 
timators do not approximate their true population values . 30 Therefore, estimation of 
the Koyck and adaptive expectations models by the usual OLS procedure may yield 
seriously misleading results. 

The partial adjustment model is different, however. In this model v, — 8u t , where 
0 < 8 < 1. Therefore, if u, satisfies the assumptions of the classical linear regression model 
given previously, so will 8u,. Thus, OLS estimation of the partial adjustment model will 
yield consistent estimates although the estimates tend to be biased (in finite or small 
samples). 31 Intuitively, the reason for consistency is this: Although Y,-\ depends on u t -\ 


29 f(V(V t _ 1 )= E (U t — XUf-l )(Ut_l -XUt-2) 

= —XE(u t ~t) 2 since covariances between u' s are zero by assumption 


30 The proof is beyond the scope of this book and may be found in Criliches, op. cit., pp. 36-38. 
However, see Chapter 18 for an outline of the proof in another context. See also Asatoshi Maeshiro, 
"Teaching Regressions with a Lagged Dependent Variable and Autocorrelated Disturbances," The 
Journal of Economic Education, Winter 1996, vol. 27, no. 1, pp. 72-84. 

31 For proof, see J. Johnston, Econometric Methods, 3d ed., McGraw-Hill, New York, 1984, 
pp. 360-362. See also H. E. Doran and J. W. B. Guise, Single Equation Methods in Econometrics: Applied 
Regression Analysis, University of New England Teaching Monograph Series 3, Armidale, NSW, 
Australia, 1984, pp. 236-244. 
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and all the previous disturbance terms, it is not related to the current error term u t . Therefore, 
as long as u t is serially independent, Y t _\ will also be independent or at least uncorrelated 
with u t , thereby satisfying an important assumption of OLS, namely, noncorrelation be¬ 
tween the explanatory variable(s) and the stochastic disturbance term. 

Although OLS estimation of the stock, or partial, adjustment model provides consistent 
estimation because of the simple structure of the error term in such a model, one should not 
assume that it applies rather than the Koyck or adaptive expectations model. 32 The reader is 
strongly advised against doing so. A model should be chosen on the basis of strong theoret¬ 
ical considerations, not simply because it leads to easy statistical estimation. Every model 
should be considered on its own merit, paying due attention to the stochastic disturbances 
appearing therein. If in models such as the Koyck or adaptive expectations model OLS can¬ 
not be straightforwardly applied, methods need to be devised to resolve the estimation prob¬ 
lem. Several alternative estimation methods are available although some of them may be 
computationally tedious. In the following section we consider one such method. 

17.9 The Method of Instrumental Variables (IV) 

The reason why OLS cannot be applied to the Koyck or adaptive expectations model is that 
the explanatory variable 7 r _i tends to be correlated with the error term v t . If somehow this 
correlation can be removed, one can apply OLS to obtain consistent estimates, as noted pre¬ 
viously. {Note: There will be some small sample bias.) How can this be accomplished? 
Liviatan has proposed the following solution. 33 

Let us suppose that we find a proxy for Y t -\ that is highly correlated with Y t -\ but is un¬ 
correlated with v u where v t is the error term appearing in the Koyck or adaptive expecta¬ 
tions model. Such a proxy is called an instrumental variable (IV ). 34 Liviatan suggests 
X t _\ as the instrumental variable for i and further suggests that the parameters of the 
regression (17.8.1) can be obtained by solving the following normal equations: 

Y Y, = na 0 + «i ^ ^ >)-i 

J2Y t X t =&Qj2 X t + °‘iJ2 X t + &2 12 Y t-' X t (17.9.1) 

Y Y tXt -1 = «o Y X >~ 1 + “i Y X ‘ X <- 1 + «2 Y Yt-iXt-i 

Notice that if we were to apply OLS directly to Eq. (17.8.1), the usual OLS normal equa¬ 
tions would be (see Section 7.4): 

Y Y t = na 0 + di Y X, + a 2 Y Y t -1 

Y Y ‘ X ‘ =«oY X < +dl ' Y X t+ &2 Y Y ‘~' X < o 7 - 9 - 2 ) 

Y Y ' Yi -> = “o Y Y ‘~'+“i Y X ‘ Y ‘- 1 +“2 E Y t -1 

The difference between the two sets of normal equations should be readily apparent. 
Liviatan has shown that the a’s estimated from Eq. (17.9.1) are consistent, whereas those 

32 Also, as J. Johnston notes (op. cit., p. 350), "[the] pattern of adjustment [suggested by the partial 
adjustment model] . . . may sometimes be implausible." 

33 N. Liviatan, "Consistent Estimation of Distributed Lags," International Economic Review, vol. 4, 
January 1963, pp. 44-52. 

34 Such instrumental variables are used frequently in simultaneous equation models (see Chapter 20). 
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estimated from Eq. (17.9.2) may not be consistent because 7 f _i and v,[ = u, — ku t ~i or 
u t - (1 - y)u, i] maybe correlated whereas X t and X t _\ are uncorrelated with v t . (Why?) 

Although easy to apply in practice once a suitable proxy is found, the Liviatan technique 
is likely to suffer from the multicollinearity problem because X t and X t _ \, which enter in the 
normal equations of (17.9.1), are likely to be highly correlated (as noted in Chapter 12, 
most economic time series typically exhibit a high degree of correlation between succes¬ 
sive values). The implication, then, is that although the Liviatan procedure yields consistent 
estimates, the estimators are likely to be inefficient. 35 

Before we move on, the obvious question is: How does one find a “good” proxy for 7_i 
in such a way that, although highly correlated with i)_i, it is uncorrelated with v t l There 
are some suggestions in the literature, which we take up by way of an exercise (see Exer¬ 
cise 17.5). But it must be stated that finding good proxies is not always easy, in which case 
the IV method is of little practical use and one may have to resort to maximum likelihood 
estimation techniques, which are beyond the scope of this book. 36 

Is there a test one can use to find out if the chosen instrument(s) is valid? Dennis Sargan 
has developed a test, dubbed the SARG test, for this purpose. The test is described in 
Appendix 17A, Section 17A.1. 


17.10 Detecting Autocorrelation in Autoregressive Models: 
Durbin h Test 


As we have seen, the likely serial correlation in the errors v t make the estimation problem 
in the autoregressive model rather complex: In the stock adjustment model the error term v t 
did not have (first-order) serial correlation if the error term u t in the original model was se¬ 
rially uncorrelated, whereas in the Koyck and adaptive expectations models v, was serially 
correlated even if u t was serially independent. The question, then, is: How does one know 
if there is serial correlation in the error term appearing in the autoregressive models? 

As noted in Chapter 12, the Durbin-Watson d statistic may not be used to detect (first- 
order) serial correlation in autoregressive models, because the computed d value in such 
models generally tends toward 2, which is the value of d expected in a truly random se¬ 
quence. In other words, if we routinely compute the d statistic for such models, there is a 
built-in bias against discovering (first-order) serial correlation. Despite this, many re¬ 
searchers compute the d value for want of anything better. However, Durbin himself has 
proposed a large-sample test of first-order serial correlation in autoregressive models. 37 
This test is called the h statistic. 

We have already discussed the Durbin h test in Exercise 12.36. For convenience, we re¬ 
produce the h statistic (with a slight change in notation): 



(17.10.1) 


35 To see how the efficiency of the estimators can be improved, consult Lawrence R. Klien, A Textbook 
of Econometrics, 2d ed., Prentice-Hall, Englewood Cliffs, NJ., 1974, p. 99. See also William H. Greene, 
Econometric Analysis, Macmillan, 2d ed.. New York, 1993, pp. 535-538. 

36 For a condensed discussion of the ML methods, see ). Johnston, op. cit., pp. 366-371, as well as 
Appendix 4A and Appendix 15A. 

37 J. Durbin, "Testing for Serial Correlation in Least-Squares Regression When Some of the Regressors 
Are Lagged Dependent Variables," Econometrica, vol. 38, 1970, pp. 410-421. 
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where n is the sample size, var(a 2 ) is the variance of the lagged Y t (=Y t _ i) coefficient 
in Eq. (17.8.1), and p is an estimate of the first-order serial correlation p, first discussed in 
Chapter 12. 

As noted in Exercise 12.36, for a large sample, Durbin has shown that, under the null 
hypothesis that p = 0, the h statistic of Eq. (17.10.1) follows the standard normal distribu¬ 
tion. That is, 


(17.10.2) 


/z asy ~ N( 0, 1) 

where asy means asymptotically. 

In practice, as noted in Chapter 12, one can estimate p 



(17.10.3) 


It is interesting to observe that although we cannot use the Durbin d to test for autocorrela¬ 
tion in autoregressive models, we can use it as an input in computing the h statistic. 

Let us illustrate the use of the h statistic with our Example 17.7. In this example, 
n = 47, p*s(\-d/2) = 0.5190 (note: d= 0.9619), and var(a 2 ) = var(PPCE,_i) = 
(0.0733) 2 = 0.0053. Putting these values in Eq. (17.10.1), we obtain: 



(17.10.4) 


Since this h value has the standard normal distribution under the null hypothesis, the prob¬ 
ability of obtaining such a high h value is very small. Recall that the probability that a stan¬ 
dard normal variate exceeds the value of ±3 is extremely small. In the present example our 
conclusion, then, is that there is (positive) autocorrelation. Of course, bear in mind that h 
follows the standard normal distribution asymptotically. Our sample of 47 observations is 
reasonably large. 

Note these features of the h statistic. 

1. It does not matter how many X variables or how many lagged values of Y are included in 
the regression model. To compute h, we need consider only the variance of the coeffi¬ 
cient of lagged Y t _\. 

2. The test is not applicable if [n variety)] exceeds 1. (Why?) In practice, though, this does 
not usually happen. 

3. Since the test is a large-sample test, its application in small samples is not strictly justi¬ 
fied, as shown by Inder 38 and Kiviet. 39 It has been suggested that the Breusch-Godfrey 
(BG) test, also known as the Lagrange multiplier test, discussed in Chapter 12 is statis¬ 
tically more powerful not only in the large samples but also in finite, or small, samples 
and is therefore preferable to the h test. 40 

The conclusion based on the h test that our model suffers from autocorrelation is 
confirmed by the Breusch-Godfrey (BG) test, which is shown in Equation (12.6.17). Using 
the seven lagged values of the residuals estimated from the regression shown in Table 17.3, 

38 B. Inder, "An Approximation to the Null Distribution of the Durbin-Watson Statistic in Models 
Containing Lagged Dependent Variables," Econometric Theory, vol. 2, no. 3, 1986, pp. 413-428. 

39 ]. F. Kiviet, "On the Vigour of Some Misspecification Tests for Modelling Dynamic Relationships," 
Review of Economic Studies, vol. 53, no. 173, 1986, pp. 241-262. 

40 Gabor Korosi, Laszlo Matyas, and Istvan P. Szekely, Practical Econometrics, Ashgate Publishing 
Company, Brookfield, Vermont, 1992, p. 92. 
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Dependent Variable: PCE 

Method: Least Squares 

Sample (adjusted): 1960-2006 

Included observations-Iffter adjustments 

Newey-West HAC Standard Errors & Covariance (lag truncation =3) 


Coefficient std. Error 


: Stjiffcis'tic Prob. 


-252.9190 

0.213890 

0.797146 


168.46-lf 
0.#5124$ 
0.051825 


-1.501390* 
4.173888 
15.38148 


<11*1404 

0.0001 

o.ooat 


S-squared 

■ i| 998216 

Mean dependent var. 

16691.28 

Adjusted E~squared 

0.998134 

S.D. dependent var. 

5205.873 

S.E. of regression; 

2:24.8504 

Akaike info criterion 

13.73045 

Sum squared resid. 

2224539. 

Schwarz criteria* 

13.84854 

Log likelihood 

-319.6656 

Har.nan "Quinn criter. 

13.77489 

F-statristi'e 

i#306.99 

'ilurbin-Watson sfcat. 

0.961921, 

Prob. (statist 1 c) 

0.000000 




the BG test shown in Eq. (12.6.18) obtained a / 2 value of 15.3869. For seven degrees of 
freedom (the number of lagged residuals used in the BG test), the probability of obtaining 
a chi-square value of as much as 15.38 or greater is about 3 percent, which is quite low. 

For this reason, we need to correct the standard errors shown in Table 17.3, which can 
be done by the Newey-West HAC procedure discussed in Chapter 12. The results are as 
shown in Table 17.4. 

It seems OLS underestimates the standard errors of the regression coefficients. 

17.11 A Numerical Example: The Demand for Money 
in Canada, 1979-1 to 1988-IV 

To illustrate the use of the models we have discussed thus far, consider one of the earlier 
empirical applications, namely, the demand for money (or real cash balances). In particu¬ 
lar, consider the following model. 41 

M* = PoR^t'Yf 2 e u ' (17.11.1) 

where M* = desired, or long-run, demand for money (real cash balances) 

R t — long-term interest rate, % 

Y, — aggregate real national income 

For statistical estimation, Eq. (17.11.1) may be expressed conveniently in log form as 
In M* = In A, + A In + ft In Y t + u t (17.11.2) 


41 For a similar model, see Gregory C. Chow, "On the Long-Run and Short-Run Demand for Money," 
lournal of Political Economy, vol. 74, no. 2, 1966, pp. 111-131. Note that one advantage of the 
multiplicative function is that the exponents of the variables give direct estimates of elasticities 
(see Chapter 6). 
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Since the desired demand variable is not directly observable, let us assume the stock 
adjustment hypothesis, namely, 



(17.11.3) 


0 < S < 1 


Equation (17.11.3) states that a constant percentage (why?) of the discrepancy between the 
actual and desired real cash balances is eliminated within a single period (year). In log 
form, Eq. (17.11.3) maybe expressed as 


lnM, - InM ; _! = «5(lnM? - lnM,_i) (17.11.4) 


Substituting In M* from Eq. (17.11.2) into Eq. (17.11.4) and rearranging, we obtain 
In M, = 8 In p 0 + Pi 8 In R, + In Y, + (1 — 5) In M,_ i + Su, (17.11.5) 42 

which may be called the short-run demand function for money. (Why?) 

As an illustration of the short-term and long-term demand for real cash balances, con¬ 
sider the data given in Table 17.5. These quarterly data pertain to Canada for the period 
1979 to 1988. The variables are defined as follows: M [as defined by Ml money supply, 
Canadian dollars (C$), millions], P (implicit price deflator, 1981 = 100), GDP at constant 
1981 prices (C$, millions), and R (90-day prime corporate rate of interest, %). 43 Ml was 
deflated by P to obtain figures for real cash balances. A priori, real money demand is 
expected to be positively related to GDP (positive income effect) and negatively related to 
R (the higher the interest rate, the higher the opportunity cost of holding money, as Ml 
money pays very little interest, if any). 

The regression results were as follows: 44 

lnM, = 0.8561 - 0.0634 In R t - 0.0237 In GDP, + 0.9607 lnM ( _i 


se = (0.5101) (0.0131) (0.0366) (0.0414) 

t= (1.6782) (-4.8134) (-0.6466) (23.1972) 


R 2 = 0.9482 d= 2.4582 F = 213.7234 (17.11.6) 


The estimated short-run demand function shows that the short-run interest elasticity has 
the correct sign and that it is statistically quite significant, as its p value is almost zero. The 
short-run income elasticity is surprisingly negative, although statistically it is not different 
from zero. The coefficient of adjustment is 8 = (1 — 0.9607) = 0.0393, implying that only 
about 4 percent of the discrepancy between the desired and actual real cash balances is 
eliminated in a quarter, a rather slow adjustment. 

42 ln passing, note that this model is essentially nonlinear in the parameters. Therefore, although OLS 
may give an unbiased estimate of, say, pi8 taken together, it may not give unbiased estimates of Pi 
and 8 individually, especially if the sample is small. 

43 These data are obtained from B. Bhaskar Rao, ed.. Cointegration for the Applied Economist, St. Martin's 
Press, New York, 1994, pp. 210-213. The original data is from 1956-1 to 1988-IV, but for illustration 
purposes we begin our analysis from the first quarter of 1979. 

44 Note this feature of the estimated standard errors. The standard error of, say, the coefficient of In R t 
refers to the standard error of Pi 8, an estimator of pi 8. There is no simple way to obtain the standard 
errors of pi and 8 individually from the standard error of pi 8, especially if the sample is relatively 
small. For large samples, however, individual standard errors of Pi and <5 can be obtained approxi¬ 
mately, but the computations are involved. See Jan Kmenta, Elements of Econometrics, Macmillan, 

New York, 1971, p. 444. 
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Observation 

Ml 

R 

P 

GDP 

1979-1 

22,175.00 

11.13333 

0.77947 

334,800 

1979-2 

22,841.00 

11.16667 

0.80861 

336,708 

1979-3 

23,461.00 

11.80000 

0.82649 

340,096 

1979-4 

23,427.00 

14.18333 

0.84863 

341,844 

1980-1 

23,811.00 

14.38333 

0.86693 

342,776 

1980-2 

23,612.33 

12.98333 

0.88950 

342,264 

1980-3 

24,543.00 

10.71667 

0.91553 

340,716 

1980-4 

25,638.66 

14.53333 

0.93743 

347,780 

1981-1 

25,316.00 

17.13333 

0.96523 

354,836 

1981-2 

25,501.33 

18.56667 

0.98774 

359,352 

1981-3 

25,382.33 

21.01666 

1.01314 

356,152 

1981-4 

24,753.00 

16.61665 

1.03410 

353,636 

1982-1 

25,094.33 

15.35000 

1.05743 

349,568 

1982-2 

25,253.66 

16.04999 

1.07748 

345,284 

1982-3 

24,936.66 

14.31667 

1.09666 

343,028 

1982-4 

25,553.00 

10.88333 

1.11641 

340,292 

1983-1 

26,755.33 

9.616670 

1.12303 

346,072 

1983-2 

27,412.00 

9.316670 

1.13395 

353,860 

1983-3 

28,403.33 

9.333330 

1.14721 

359,544 

1983-4 

28,402.33 

9.550000 

1.16059 

362,304 

1984-1 

28,715.66 

10.08333 

1.17117 

368,280 

1984-2 

28,996.33 

11.45000 

1.1 7406 

376,768 

1984-3 

28,479.33 

12.45000 

1.17795 

381,016 

1984-4 

28,669.00 

10.76667 

1.18438 

385,396 

1985-1 

29,018.66 

10.5166 7 

1.18990 

390,240 

1985-2 

29,398.66 

9.666670 

1.20625 

391,580 

1985-3 

30,203.66 

9.033330 

1.21492 

396,384 

1985-4 

31,059.33 

9.016670 

1.21805 

405,308 

1986-1 

30,745.33 

11.03333 

1.22408 

405,680 

1986-2 

30,477.66 

8.733330 

1.22856 

408,116 

1986-3 

31,563.66 

8.466670 

1.23916 

409,160 

1986-4 

32,800.66 

8.400000 

1.25368 

409,616 

1987-1 

33,958.33 

7.250000 

1.27117 

416,484 

1987-2 

35,795.66 

8.300000 

1.28429 

422,916 

1987-3 

35,878.66 

9.300000 

1.29599 

429,980 

1987-4 

36,336.00 

8.700000 

1.31001 

436,264 

1988-1 

36,480.33 

8.616670 

1.32325 

440,592 

1988-2 

37,108.66 

9.133330 

1.33219 

446,680 

1988-3 

38,423.00 

10.05000 

1.35065 

450,328 

1988-4 

38,480.66 

10.83333 

1.36648 

453,516 


Notes: Ml = C$, millions. 

P = implicit price deflator (1981 = 100). 

R = 90-day prime corporate interest rate, %. 
GDP = C$, millions (1981 prices). 


To get back to the long-run demand function (17.11.2), all that needs to be done is to divide 
the short-run demand function through by <5 (why?) and drop the In M t _ \ term. The results are: 

inA/f = 21.7888 - 1.61321ni?, — 0.6030 In GDP (17.11.7) 45 


45 Note that we have not presented the standard errors of the estimated coefficients for reasons 
discussed in footnote 44. 
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As can be seen, the long-run interest elasticity of demand for money is substantially greater 
(in absolute terms) than the corresponding short-run elasticity, which is also true of the in¬ 
come elasticity, although in the present instance its economic and statistical significance is 
dubious. 

Note that the estimated Durbin-Watson d is 2.4582, which is close to 2. This substanti¬ 
ates our previous remark that in the autoregressive models the computed d is generally 
close to 2. Therefore, we should not trust the computed d to find out whether there was se¬ 
rial correlation in our data. The sample size in our case is 40 observations, which may be 
reasonably large to apply the h test. In the present case, the reader can verify that the esti¬ 
mated h value is -1.5008, which is not significant at the 5 percent level, perhaps suggest¬ 
ing that there is no first-order autocorrelation in the error term. 


17.12 Illustrative Examples 

In this section we present a few examples of distributed lag models to show how 
researchers have used them in empirical studies. 


EXAMPLE 17.9 To assess the effect of Mi (currency + checkable deposits) growth on Aaa bond real inter¬ 
ne Fed and the est rate measure, G. J. Santoni and Courtenay C. Stone 46 estimated, using monthly data, 
Real Rate of the f°^ ow ' n 9 distributed lag model for the United States. 

Interest ft _ cons tant + ^ a,- M t -i + u; ( 17 . 12 . 1 ) 


where r t = Moody's Index of Aaa bond yield minus the average annual rate of change in 
the seasonally adjusted consumer price index over the prior 36 months, which is used as 
the measure of real interest rate, and M t = monthly Mi growth. 

According to the "neutrality of money doctrine," real economic variables—such as out¬ 
put, employment, economic growth, and the real rate of interest—are not influenced per¬ 
manently by money growth and, therefore, are essentially unaffected by monetary policy.... 
Given this argument, the Federal Reserve has no permanent influence over the real rate of 
interest whatsoever. 47 

If this doctrine is valid, then one should expect the distributed lag coefficients u, as well 
as their sum to be statistically indifferent from zero. To find out whether this is the case, 
the authors estimated Eq. (17.12.1) for two different time periods, February 1951 to Sep¬ 
tember 1979 and October 1979 to November 1982, the latter to take into account the 
change in the Fed's monetary policy, which since October 1979 has paid more attention 
to the rate of growth of the money supply than to the rate of interest, which was the 
policy in the earlier period. Their regression results are presented in Table 17.6. The results 
seem to support the "neutrality of money doctrine," since for the period February 1951 to 
September 1979 the current as well as lagged money growth had no statistically signifi¬ 
cant effect on the real interest rate measure. For the latter period, too, the neutrality doc¬ 
trine seems to hold since £ o, is not statistically different from zero; only the coefficient oi 
is significant, but it has the wrong sign. (Why?) 


46 "The Fed and the Real Rate of Interest," Review, Federal Reserve Bank of St. Louis, December 1982, 

pp. 8-18. 

47 lbid. p. 15. 




Chapter 1 7 Dynamic Econometric Models: Autoregressive and Distributed-Lag Models 643 


EXAMPLE 17.9 TABLE 17.6 Influence of Monthly Ml Growth on an Aaa Bond Real Interest Rate 
( Continued) Measure: February 1951 to November 1982 


r= constant + j] ajM^ t _ 

i -n 



February 1951 to 
September 1979 

October 1979 to 
November 1982 

Coefficient 

Stl * 

Coefficient 

Iff 

Constant 

1.4885* 

2.068 

1.0360 

0.801 

Oo 

-0.00088 

0.388 

0.00840 

1.014 

Ol 

0.001 71 

0.510 

0.03960* 

3.419 

a 2 

0.001 70 

0.423 

0.03112 

2.003 

03 

0.00233 

0.542 

0.02719 

1.502 

0 4 

-0.00249 

0.553 

0.00901 

0.423 

0 5 

-0.00160 

0.348 

0.01940 

0.863 

0 6 

0.00292 

0.631 

0.02411 

1.056 

07 

0.00253 

0.556 

0.01446 

0.666 

0 8 

0.00000 

0.001 

-0.00036 

0.019 

0 9 

0.00074 

0.181 

-0.00499 

0.301 

Oio 

0.00016 

0.045 

-0.01126 

0.888 

on 

0.00025 

0.107 

-0.001 78 

0.211 

£ o , 

0.00737 

0.221 

0.1549 

0.926 

R 2 

0.9826 


0.8662 


D-W 

2.07 


2.04 


RH01 

1.27* 

24.536 

1.40* 

9.838 

RH02 

-0.28* 

5.410 

-0.48* 

3.373 

NOB 

344. 


38. 


SER (= RSS) 

0.1548 


0.3899 



^Significantly different from zero at the 0.05 level. 

Source: G. J. Santoni and Courtenay C. Stone, “The Fed and the Real Rate of Interest,” Review , Federal Reserve Bank of St. Louis, 
December 1982, p. 16. 


EXAMPLE 17.10 

The Short- and 
Long-Run 
Aggregate 
Consumption for 
Sri Lanka, 
1967-1993 


Suppose consumption C is linearly related to permanent income X*: 

Ct = 0i +p 2 X? + u t ( 17 . 12 . 2 ) 

Since X* is not directly observable, we need to specify the mechanism that generates per¬ 
manent income. Suppose we adopt the adaptive expectations hypothesis specified in 
Eq. (1 7.5.2). Using Eq. (1 7.5.2) and simplifying, we obtain the following estimating 
equation (cf. 1 7.5.5): 

C t = ai + a 2 X t + a 3 C t -i + v t ( 17 . 12 . 3 ) 


where aq = yfi i 


012 = yPl 
«3 = (1 - y) 

V't = [Wt — (1 -y)u f -i] 


As we know, p 2 gives the mean response of consumption to, say, a $1 increase in per¬ 
manent income, whereas a 2 gives the mean response of consumption to a $1 increase in 
current income. 


( Continued ) 
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EXAMPLE 17.10 From annual data for Sri Lanka for the period 1967-1993 given in Table 17.7, the 
( Continued) following regression results were obtained: 48 

C= 1038.403 + 0.4043Xt+ 0.5009 Qi 
se= (2501.455) (0.0919) (0.1213) ( 17 . 12 . 4 ) 

f= (0.4151) (4.3979) (4.1293) 

R 2 = 0.9912 c/= 1.4162 F= 1298.466 

where C= private consumption expenditure, and X= GDP, both at constant prices. We 
also introduced real interest rate in the model, but it was not statistically significant. 

The results show that the short-run marginal propensity to consume (MPC) is 0.4043, 
suggesting that a 1 rupee increase in the current or observed real income (as measured by 
real GDP) would increase mean consumption by about 0.40 rupee. But if the increase in 
income is sustained, then eventually the MPC out of the permanent income will be 
(62 = Yfh/Y = 0.4043/0.4991 = 0.8100, or about 0.81 rupee. In other words, when con¬ 
sumers have had time to adjust to the 1 rupee change in income, they will increase their 
consumption ultimately by about 0.81 rupee. 

Now suppose that our consumption function were 


Q* = Pt + P2%t + Ut 


( 17 . 12 . 5 ) 


In this formulation permanent or long-run consumption C t is a linear function of the cur¬ 
rent or observed income. Since C* is not directly observable, let us invoke the partial ad¬ 
justment model (17.6.2). Using this model, and after algebraic manipulations, we obtain 


C t = 801 + Sp 2 X t + (1 - <5)C t _i + Su t 
= ai + a 2 Xt + asCt-i + Vt 


( 17 . 12 . 6 ) 


In appearance, this model is indistinguishable from the adaptive expectations model 
(17.12.3). Therefore, the regression results given in (17.12.4) are equally applicable here. 
However, there is a major difference in the interpretation of the two models, not to mention 
the estimation problem associated with the autoregressive and possibly serially correlated 


TABLE 17.7 

Private 

Observation 

PCON 

GDP 

Observation 

PCON 

GDP 

Consumption 
Expenditure and 
GDP, Sri Lanka 

1967 

61,284 

78,221 

1981 

120,477 

152,846 

1968 

68,814 

83,326 

1982 

133,868 

164,318 

1969 

76,766 

90,490 

1983 

148,004 

172,414 

1970 

73,576 

92,692 

1984 

149,735 

178,433 

Source: See footnote 48. 

1971 

73,256 

94,814 

1985 

155,200 

185,753 


1972 

67,502 

92,590 

1986 

154,165 

192,059 


1973 

78,832 

101,419 

1987 

155,445 

191,288 


1974 

80,240 

105,267 

1988 

157,199 

196,055 


1975 

84,477 

112,149 

1989 

158,576 

202,477 


1976 

86,038 

116,078 

1990 

169,238 

223,225 


1977 

96,275 

122,040 

1991 

1 79,001 

233,231 


1978 

101,292 

128,578 

1992 

183,687 

242,762 


1979 

105,448 

136,851 

1993 

198,273 

259,555 


1980 

114,570 

144,734 





Notes: PCON = private consumption expenditure. 


48 The data are obtained from the data disk in Chandan Mukherjee, Howard White, and Marc Wuyts, 
Econometrics and Data Analysis for Developing Countries, Routledge, New York, 1998. The original data 
are from World Bank's World Tables. 




Chapter 1 7 Dynamic Econometric Models: Autoregressive and Distributed-Lag Models 645 


EXAMPLE 17.10 model (17.12.3). The model (17.12.5) is the long-run, or equilibrium, consumption func- 
( Continued) tion, whereas the model (17.12.6) is the short-run consumption function. ft 2 measures the 

long-run MPC, whereas a 2 (= Sft 2 ) gives the short-run MPC; the former can be obtained 
from the latter by dividing it by <5, the coefficient of adjustment. 

Returning to (17.12.4), we can now interpret 0.4043 as the short-run MPC. Since 
5 = 0.4991, the long-run MPC is 0.81. Note that the adjustment coefficient of about 0.50 
suggests that in any given time period consumers only adjust their consumption one-half 
of the way toward its desired or long-run level. 

This example brings out the crucial point that in appearance the adaptive expectations 
and the partial adjustment models, or the Koyck model for that matter, are so similar that 
by just looking at the estimated regression, such as Eq. (17.12.4), one cannot tell which is 
the correct specification. That is why it is so vital that one specify the theoretical under¬ 
pinning of the model chosen for empirical analysis and then proceed appropriately. 
If habit or inertia characterizes consumption behavior, then the partial adjustment model 
is appropriate. On the other hand, if consumption behavior is forward-looking in the sense 
that it is based on expected future income, then the adaptive expectations model is ap¬ 
propriate. If it is the latter, then, one will have to pay close attention to the estimation 
problem to obtain consistent estimators. In the former case, the OLS will provide consis¬ 
tent estimators, provided the usual OLS assumptions are fulfilled. 


17.13 The Almon Approach to Distributed-Lag Models: 

The Almon or Polynomial Distributed Lag (PDL) 49 

Although used extensively in practice, the Koyck distributed-lag model is based on the 
assumption that the ft coefficients decline geometrically as the lag lengthens (see Fig¬ 
ure 17.5). This assumption may be too restrictive in some situations. Consider, for exam¬ 
ple, Figure 17.7. 

In Figure 17.7 a it is assumed that the ft’s increase at first and then decrease, whereas in 
Figure 17.7c it is assumed that they follow a cyclical pattern. Obviously, the Koyck scheme 
of distributed-lag models will not work in these cases. However, after looking at Fig¬ 
ures 17.7a and c, it seems that one can express fti as a function of i, the length of the lag 
(time), and fit suitable curves to reflect the functional relationship between the two, as 
indicated in Figures 17.7b and d. This approach is precisely the one suggested by Shirley 
Almon. To illustrate her technique, let us revert to the finite distributed-lag model consid¬ 
ered previously, namely, 

Y t = a + ft 0 X t + ft l X t _! + ft 2 Xf— 2 + • • • + ftkXt-k + u t (17.1.2) 
which may be written more compactly as 

k 

Y t =a + J2fti X ‘-i+ u t (17.13.1) 

Following a theorem in mathematics known as Weierstrass’ theorem, Almon assumes 
that fti can be approximated by a suitable-degree polynomial in i, the length of the lag. 50 For 
instance, if the lag scheme shown in Figure 17.7a applies, we can write 

fti = a 0 + a\i +a 2 i 2 (17.13.2) 

49 Shirley Almon, "The Distributed Lag between Capital Appropriations and Expenditures," Economet- 
rica, vol. 33, January 1965, pp. 1 78-196. 

50 Broadly speaking, the theorem states that on a finite closed interval any continuous function may 
be approximated uniformly by a polynomial of a suitable degree. 
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which is a quadratic, or second-degree, polynomial in i (see Figure 1 1 .lb). However, if the 
P’s follow the pattern of Figure 17.7c, we can write 

Pi = a 0 + a\i + a 2 i 2 + a 3 z' 3 (17.13.3) 

which is a third-degree polynomial in i (see Figure 17.7 d). More generally, we may write 

Pi = a 0 + a\i + a 2 i 2 -\ -1- a m i m (17.13.4) 

which is an mth-degree polynomial in i. It is assumed that m (the degree of the polynomial) 
is less than k (the maximum length of the lag). 

To explain how the Almon scheme works, let us assume that the P’s follow the pattern 
shown in Figure 17.7a and, therefore, the second-degree polynomial approximation is 
appropriate. Substituting Eq. (17.13.2) into Eq. (17.13.1), we obtain 

Yt = ol + y (ap + aiz + d 2 i 2 )X t _i + Ut 
i=0 

k k k 

— a + d 0 ^2 x ‘~i + al XI iXt ~‘ + fl 2 ^2 i2x t-i + u ‘ 


(17.13.5) 
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Defining 


Z 0 , = X>_ ; . 

2=0 

Zi, = 

i=0 

z 2t =iz i2 x t ~i 

1=0 


(17.13.6) 


we may write Eq. (17.13.5) as 


Y t = a + aoZot + + a 2 Z 2t + 


(17.13.7) 


In the Almon scheme Y is regressed on the constructed variables Z, not the original X 
variables. Note that Eq. (17.13.7) can be estimated by the usual OLS procedure. The esti¬ 
mates of a and a, thus obtained will have all the desirable statistical properties provided the 
stochastic disturbance term u satisfies the assumptions of the classical linear regression 
model. In this respect, the Almon technique has a distinct advantage over the Koyck 
method because, as we have seen, the latter has some serious estimation problems that 
result from the presence of the stochastic explanatory variable Y,_ \ and its likely correla¬ 
tion with the disturbance term. 

Once the a’s are estimated from Eq. (17.13.7), the original P’s can be estimated from 
Eq. (17.13.2) (or more generally from Eq. [17.13.4]) as follows: 



01 = &o + a\ + a 2 


02=a o + 2a 1 + 4a 2 
03 = a 0 + 3ai + 9 a 2 


(17.13.8) 


0k = a 0 + ka\ + k 2 a 2 


Before we apply the Almon technique, we must resolve the following practical 
problems. 

1. The maximum length of the lag k must be specified in advance. Elere perhaps one can 
follow the advice of Davidson and MacKinnon: 

The best approach is probably to settle the question of lag length first, by starting with a very 
large value of q [the lag length] and then seeing whether the fit of the model deteriorates sig¬ 
nificantly when it is reduced without imposing any restrictions on the shape of the distributed 
lag. 51 

Remember that if there is some “true” lag length, choosing fewer lags will lead to the “omis¬ 
sion of relevant variable bias,” whose consequences, as we saw in Chapter 13, can be very 
serious. On the other hand, choosing more lags than necessary will lead to the “inclusion of 
irrelevant variable bias,” whose consequences are less serious; the coefficients can be con¬ 
sistently estimated by OLS, although their variances may be less efficient. 


51 Russell Davidson and James C. MacKinnon, Estimation and Inference in Econometrics, Oxford 
University Press, New York, 1993, pp. 675-676. 
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One can use the Akaike or Schwarz information criterion discussed in Chapter 13 to 
choose the appropriate lag length. These criteria can also be used to discuss the appropriate 
degree of the polynomial in addition to the discussion in point 2. 

2. Having specified k, we must also specify the degree of the polynomial m. Generally, 
the degree of the polynomial should be at least one more than the number of turning points 
in the curve relating to i. Thus, in Figure 17.7a there is only one turning point; hence a 
second-degree polynomial will be a good approximation. In Figure 17.7c there are two 
turning points; hence a third-degree polynomial will provide a good approximation. A 
priori, however, one may not know the number of turning points, and therefore, the choice 
of m is largely subjective. However, theory may suggest a particular shape in some cases. 
In practice, one hopes that a fairly low-degree polynomial (say, m — 2 or 3) will give good 
results. Having chosen a particular value of m, if we want to find out whether a higher- 
degree polynomial will give a better fit, we can proceed as follows. 

Suppose we must decide between the second- and third-degree polynomials. For the 
second-degree polynomial the estimating equation is as given by Eq. (17.13.7). For the 
third-degree polynomial the corresponding equation is 

Y, = a + aoZ 0t + a\Z\ t + a 2 Z 2 t + a^Z^, + u t (17.13.9) 

where Z^ t = J2i=o t 3 Xt-i- After running regression (17.13.9), if we find that ai is statisti¬ 
cally significant but a 3 is not, we may assume that the second-degree polynomial provides a 
reasonably good approximation. 

Alternatively, as Davidson and MacKinnon suggest, “After q [the lag length] is deter¬ 
mined, one can then attempt to determine d [the degree of the polynomial] once again start¬ 
ing with a large value and then reducing it.” 52 

However, we must beware of the problem of multicollinearity, which is likely to arise 
because of the way the Z’s are constructed from the X’s, as shown in Eq. (17.13.6) (see 
also Eq. [17.13.10]). As shown in Chapter 10, in cases of serious multicollinearity, 03 
may turn out to be statistically insignificant, not because the true is zero, but simply 
because the sample at hand does not allow us to assess the separate impact of Z3 on Y. 
Therefore, in our illustration, before we accept the conclusion that the third-degree 
polynomial is not the correct choice, we must make sure that the multicollinearity prob¬ 
lem is not serious enough, which can be done by applying the techniques discussed in 
Chapter 10. 

3. Once m and k are specified, the Z’s can be readily constructed. For instance, if m = 2 
and k = 5, the Z’s are 

5 

Z 0 t = X ‘~‘ = (X ‘ + X ‘-' + X ‘~ 2 + + X '- 5 ) 

(=0 

5 

Z u = iX t ^i = (X t _\ + 2X t _ 2 + 3X t _ 3 + 4X t _4 + 5X,_ S ) (17.13.10) 

!= 0 
5 

Z 2 , = J2 ilX *-i = (X ‘-' + 4X ‘-2 + 9X t -3 + 16X '—4 + 25X t-s) 

Notice that the Z’s are linear combinations of the original A’s. Also notice why the Z’s 
are likely to exhibit multicollinearity. 


52 lbid., pp. 675-676. 
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Before proceeding to a numerical example, note the advantages of the Almon method. 
First, it provides a flexible method of incorporating a variety of lag structures (see Exer¬ 
cise 17.17). The Koyck technique, on the other hand, is quite rigid in that it assumes that 
the /Fs decline geometrically. Second, unlike the Koyck technique, in the Almon method 
we do not have to worry about the presence of the lagged dependent variable as an ex¬ 
planatory variable in the model and the problems it creates for estimation. Finally, if a 
sufficiently low-degree polynomial can be fitted, the number of coefficients to be esti¬ 
mated (the a’s) is considerably smaller than the original number of coefficients (the /Fs). 

But let us re-emphasize the problems with the Almon technique. First, the degree of the 
polynomial as well as the maximum value of the lag is largely a subjective decision. Second, 
for reasons noted previously, the Z variables are likely to exhibit multicollinearity. Therefore, 
in models like Eq. (17.13.9) the estimated a’s are likely to show large standard errors (relative 
to the values of these coefficients), thereby rendering one or more such coefficients statisti¬ 
cally insignificant on the basis of the conventional t test. But this does not necessarily mean 
that one or more of the original P coefficients will also he statistically insignificant. (The proof 
of this statement is slightly involved but is suggested in Exercise 17.18.) As a result, the mul¬ 
ticollinearity problem may not be as serious as one might think. Besides, as we know, in cases 
of multicollinearity even if we cannot estimate an individual coefficient precisely, a linear 
combination of such coefficients (the estimable function) can be estimated more precisely. 


EXAMPLE 17.11 

Illustration of the 
Almon Distributed- 
Lag Model 


To illustrate the Almon technique, Table 17.8 gives data on inventories Y and sales X for 
the United States for the period 1954-1999. 

For illustrative purposes, assume that inventories depend on sales in the current year 
and in the preceding 3 years as follows: 

V' t = a + y 6 0 X t + /3iX t -i +p?X t 2 +foX t 3 +u t (17.13.11) 


Furthermore, assume that p-, can be approximated by a second-degree polynomial as 
shown in Eq. (1 7.13.2). Then, following Eq. (1 7.13.7), we may write 


Yt = a + aoZot + Oi Z-\ t + a 2 Z2t + Ut (17.13.12) 


where 

3 

Z ot = J2 X t -i = (X, + X t _! + Xf_2 + X t _ 3 ) 

/=0 
3 

Z 1t = £iX,_/ = (X t _ 1 +2X t _2 + 3X t _ 3 ) ( 17 . 13 . 13 ) 

1=0 
3 

Zlt = X) /2X t-' = < X t-l + 4X <~2 + 9X t- 3) 

1=0 


The Z variables thus constructed are shown in Table 17.8. Using the data on /and the Z's, 
we obtain the following regression: 

ft =25,845.06 + 1.1149Z 0t - 0.3713Z U - 0.0600Z 2t 


se = (6596.998) (0.5381) 

t= (3.9177) (2.0718) 

R 2 = 0.9755 


(1.3743) (0.4549) 

(-0.2702) (-0.1319) 

d= 0.1643 F= 51 7.7656 


( 17 . 13 . 14 ) 


Note: Since we are using a 3-year lag, the total number of observations has been reduced 
from 46 to 43. 


( Continued ) 
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EXAMPLE 17.11 TABLE 1 7.8 Inventories Y and Sales X, U.S. Manufacturing, and Constructed Z’s 
( Continued) Observation Inventory Sales Z 0 Zj Z 2 


1954 

41,612 

23,355 

NA 

NA 

NA 

1955 

45,069 

26,480 

NA 

NA 

NA 

1956 

50,642 

27,740 

NA 

NA 

NA 

1957 

51,871 

28,736 

106,311 

150,765 

343,855 

1958 

50,203 

27,248 

110,204 

163,656 

378,016 

1959 

52,91 3 

30,286 

114,010 

167,940 

391,852 

1960 

53,786 

30,878 

117,148 

170,990 

397,902 

1961 

54,871 

30,922 

119,334 

173,194 

397,254 

1962 

58,172 

33,358 

125,444 

183,536 

427,008 

1963 

60,029 

35,058 

130,216 

187,836 

434,948 

1964 

63,410 

37,331 

136,669 

194,540 

446,788 

1965 

68,207 

40,995 

146,742 

207,521 

477,785 

1966 

77,986 

44,870 

158,254 

220,831 

505,841 

1967 

84,646 

46,486 

169,682 

238,853 

544,829 

1968 

90,560 

50,229 

182,580 

259,211 

594,921 

1969 

98,145 

53,501 

195,086 

277,811 

640,003 

1970 

101,599 

52,805 

203,021 

293,417 

672,791 

1971 

102,567 

55,906 

212,441 

310,494 

718,870 

1972 

108,121 

63,027 

225,239 

322,019 

748,635 

1973 

124,499 

72,931 

244,669 

333,254 

761,896 

1974 

157,625 

84,790 

276,654 

366,703 

828,193 

1975 

159,708 

86,589 

307,337 

419,733 

943,757 

1976 

1 74,636 

98,797 

343,107 

474,962 

1,082,128 

1977 

188,378 

113,201 

383,377 

526,345 

1,208,263 

1978 

211,691 

126,905 

425,492 

570,562 

1,287,690 

1979 

242,157 

143,936 

482,839 

649,698 

1,468,882 

1980 

265,215 

154,391 

538,433 

737,349 

1,670,365 

1981 

283,413 

168,129 

593,361 

822,978 

1,872,280 

1982 

311,852 

163,351 

629,807 

908,719 

2,081,117 

1983 

312,379 

1 72,547 

658,418 

962,782 

2,225,386 

1984 

339,516 

190,682 

694,709 

1,003,636 

2,339,112 

1985 

334,749 

194,538 

721,118 

1,025,829 

2,351,029 

1986 

322,654 

194,657 

752,424 

1,093,543 

2,510,189 

1987 

338,109 

206,326 

786,203 

1,155,779 

2,688,947 

1988 

369,374 

224,619 

820,140 

1,179,254 

2,735,796 

1989 

391,212 

236,698 

862,300 

1,221,242 

2,801,836 

1990 

405,073 

242,686 

910,329 

1,304,914 

2,992,108 

1991 

390,905 

239,847 

943,850 

1,389,939 

3,211,049 

1992 

382,510 

250,394 

969,625 

1,435,313 

3,340,873 

1993 

384,039 

260,635 

993,562 

1,458,146 

3,393,956 

1994 

404,877 

279,002 

1,029,878 

1,480,964 

3,420,834 

1995 

430,985 

299,555 

1,089,586 

1,551,454 

3,575,088 

1996 

436,729 

309,622 

1,148,814 

1,639,464 

3,761,278 

1997 

456,133 

327,452 

1,215,631 

1,745,738 

4,018,860 

1998 

466,798 

337,687 

1,274,316 

1,845,361 

4,261,935 

1999 

470,377 

354,961 

1,329,722 

1,921,457 

4,434,093 


Note: Y and X are in millions of dollars, seasonally adjusted. 

Source: Economic Report of the President, 2001, Table B-57, p. 340. The Z’s are as 


i shown in Eq. (17.13.13). 
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EXAMPLE 17.11 

(' Continued) 


FIGURE 17.8 

Lag structure of the 
illustrative example. 


A brief comment on the preceding results is in order. Of the three Z variables, only Zq 
is individually statistically significant at the 5 percent level, but the others are not, yet the 
F value is so high that we can reject the null hypothesis that collectively the Z's have no 
effect on Y. As you may suspect, this might very well be due to multicollinearity. Also, note 
that the computed d value is very low. This does not necessarily mean that the residuals 
suffer from autocorrelation. More likely, the low d value suggests that the model we have 
used is probably mis-specified. We will comment on this shortly. 

From the estimated o's given in Eq. (17.13.3), we can easily estimate the original p's 
easily, as shown in Eq. (17.13.8). In the present example, the results are as follows: 


Po = 6 0 = 1.1149 
Pi = (do + di + d 2 ) = 0.6836 
P 2 = (So + 2oi + 4d 2 ) = 0.1 321 
h = (So + 3 Si + 9d 2 ) = -0.5394 


( 17 . 13 . 15 ) 


Thus, the estimated distributed-lag model corresponding to Eq. (17.13.11) is: 

Y t = 25,845.0 +1.1150X 0 + 0.6836X t -i + 0.1321 X t _ 2 - 0.5394X t _ 3 

se = (6596.99) (0.5381) (0.4672) (0.4656) (0.5656) ( 17 . 13 . 16 ) 

t= (3.9177) (2.0718) (1.4630) (0.2837) (-0.9537) 

Geometrically, the estimated p is as shown in Figure 17.8. 



-0.81_1 -—1 _1_ l _ 

0.5 1.0 1.5 2.0 2.5 

Lag 


4.0 4.5 


Our illustrative example may be used to point out a few additional features of the Almon 

lag procedure: 

1. The standard errors of the a coefficients are directly obtainable from the OLS regression 
(17.13.14), but the standard errors of some of the P coefficients, the objective of primary 
interest, cannot be so obtained. But they can be obtained from the standard errors of the 
estimated a coefficients by using a well-known formula from statistics, which is given in 
Exercise 17.18. Of course, there is no need to do this manually, for most statistical pack¬ 
ages can do this routinely. The standard errors given in Eq. (17.13.15) were obtained 
from E Views 6. 
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2. The /Ts obtained in Eq. (17.13.16) are called unrestricted estimates in the sense that no 
a priori restrictions are placed on them. In some situations, however, one may want to 
impose the so-called endpoint restrictions on the /Ts by assuming that fig and /?*■ (the 
current and Ath lagged coefficient) are zero. Because of psychological, institutional, or 
technical reasons, the value of the explanatory variable in the current period may not 
have any impact on the current value of the regressand, thereby justifying the zero value 
for Po. By the same token, beyond a certain time the Ath lagged coefficient may not have 
any impact on the regressand, thus supporting the assumption that ftk is zero. In our 
inventory example (Example 17.11), the coefficient ofX,_3 had a negative sign, which 
may not make economic sense. Hence, one may want to constrain that coefficient to 
zero. 53 Of course, you do not have to constrain both ends; you could put restriction only 
on the first coefficient, called near-end restriction, or on the last coefficient, called 
far-end restriction. For our inventory example, this is illustrated in Exercise 17.28. 
Sometimes the P's are estimated with the restriction that their sum is 1. But one should 
not put such restrictions mindlessly because such restrictions also affect the values of 
the other (unconstrained) lagged coefficients. 

3. Since the choice of the number of lagged coefficients as well as the degree of the poly¬ 
nomial is at the discretion of the modeler, some trial and error is inevitable, the charge 
of data mining notwithstanding. Here is where the Akaike and Schwarz information 
criteria discussed in Chapter 13 may come in handy. 

4. Since we estimated Eq. (17.13.16) using three lags and the second-degree polynomial, 
it is a restricted least-squares model. Suppose we decide to use three lags but do not use 
the Almon polynomial approach. That is, we estimate Eq. (17.13.11) by OLS. What 
then? Let us first see the results: 


% = 26,008.60 + 0.9771X, + 1.013MT t _i - 

se = (6691.12) (0.6820) (1.0920) 

t= (3.8870) (1.4327) (0.9284) 

R 2 = 0.9755 d = 0.1571 


0.2022 X t _2 ~ 0.3935X ( _3 

(1.1021) (0.7186) 

(-0.1835) (-0.5476) 

F= 379.51 (17.13.17) 


If you compare these results with those given in Eq. (17.13.16), you will see that the over¬ 
all R 2 is practically the same, although the lagged pattern in (17.13.17) shows more of a 
humped shape than that exhibited by Eq. (17.13.16). It is left to the reader to verify the R 2 
value from (17.13.16). 

As this example illustrates, one has to be careful in using the Almon distributed lag tech¬ 
nique, as the results might be sensitive to the choice of the degree of the polynomial and/or 
the number of lagged coefficients. 


17.14 Causality in Economics: The Granger Causality Test 54 


Back in Section 1.4 we noted that, although regression analysis deals with the dependence 
of one variable on other variables, it does not necessarily imply causation. In other words, 
the existence of a relationship between variables does not prove causality or the direction 

53 For a concrete application, see D. B. Batten and Daniel Thornton, "Polynomial Distributed Lags and the 
Estimation of the St. Louis Equation," Review, Federal Reserve Bank of St. Louis, April 1983, pp. 13-25. 
54 There is another test of causality that is sometimes used, the so-called Sims test of causality. We 
discuss it by way of an exercise. 



Chapter 1 7 Dynamic Econometric Models: Autoregressive and Distributed-Lag Models 653 


of influence. But in regressions involving time series data, the situation may be somewhat 
different because, as one author puts it, 

. . . time does not run backward. That is, if event A happens before event B, then it is possible 
that A is causing B. However, it is not possible that B is causing A. In other words, events in the 
past can cause events to happen today. Future events cannot . 55 [Emphasis added.] 

This is roughly the idea behind the so-called Granger causality test. 56 But it should be noted 
clearly that the question of causality is deeply philosophical with all kinds of controversies. 
At one extreme are people who believe that “everything causes everything,” and at the other 
extreme are people who deny the existence of causation whatsoever. 57 The econometrician 
Edward Learner prefers the term precedence over causality. Francis Diebold prefers the 
term predictive causality. As he writes: 

... the statement “y,- causes yf is just shorthand for the more precise, but long-winded, 
statement, “y,- contains useful information for predicting yj (in the linear least squares 
sense), over and above the past histories of the other variables in the system.” To save 
space, we simply say that y,- causes y ,. 58 


The Granger Test 

To explain the Granger test, we will consider the often asked question in macroeconomics: 
Is it GDP that “causes” the money supply M (GDP —> M)1 Or is it the money supply M 
that causes GDP (M —»■ GDP)? (where the arrow points to the direction of causality). The 
Granger causality test assumes that the information relevant to the prediction of the 
respective variables, GDP and M, is contained solely in the time series data on these 
variables. The test involves estimating the following pair of regressions: 

GDP/ = Y^ouM^i +^/J / GDP ( _ / +m, (17.14.1) 

M t = Y A Mt~i + Y ' 5 / GDP '-/ + “2 1 (17.14.2) 

where it is assumed that the disturbances u \ t and U2t are uncorrelated. In passing, note that, since 
we have two variables, we are dealing with bilateral causality. In the chapters on time series 
econometrics, we will extend this to multivariable causality through the technique of vector 
autoregression (VAR). 

Equation (17.14.1) postulates that current GDP is related to past values of itself as well as 
that of M, and Eq. (17.14.2) postulates a similar behavior for M. Note that these regressions can 


55 Gary Koop, Analysis of Economic Data, John Wiley & Sons, New York, 2000, p. 175. 

56 C. W. J. Granger, "Investigating Causal Relations by Econometric Models and Cross-Spectral Meth¬ 
ods," Econometrica, July 1969, pp. 424-438. Although popularly known as the Granger causality test, 
it is appropriate to call it the Wiener-Granger causality test, for it was earlier suggested by 
Wiener. See N. Wiener, "The Theory of Prediction," in E. F. Beckenback, ed.. Modern Mathematics for 
Engineers, McGraw-Hill, New York, 1956, pp. 165-190. 

57 For an excellent discussion of this topic, see Arnold Zellner, "Causality and Econometrics," Carnegie- 
Rochester Conference Series, 10, K. Brunner and A. H. Meltzer, eds.. North Holland Publishing 
Company, Amsterdam, 1979, pp. 9-50. 

58 Francis X. Diebold, Elements of Forecasting, South Western Publishing, 2d ed., 2001, p. 254. 
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be cast in growth forms, GDP and M, where a dot over a variable indicates its growth rate. We 
now distinguish four cases: 

1. Unidirectional causality from M to GDP is indicated if the estimated coefficients on the 
lagged M in Eq. (17.14.1) are statistically different from zero as a group and the set of 
estimated coefficients on the lagged GDP in Eq. (17.14.2) is not statistically different 
from zero. 

2. Conversely, unidirectional causality from GDP to M exists if the set of lagged M coeffi¬ 
cients in Eq. (17.14.1) is not statistically different from zero and the set of the lagged 
GDP coefficients in Eq. (17.14.2) is statistically different from zero. 

3. Feedback, or bilateral causality, is suggested when the sets of M and GDP coefficients 
are statistically significantly different from zero in both regressions. 

4. Finally, independence is suggested when the sets of M and GDP coefficients are not sta¬ 
tistically significant in either of the regressions. 

More generally, since the future cannot predict the past, if variable X (Granger) causes 
variable Y, then changes in X should precede changes in Y. Therefore, in a regression of Y 
on other variables (including its own past values) if we include past or lagged values of X 
and it significantly improves the prediction of Y, then we can say that X (Granger) causes Y. 
A similar definition applies if Y (Granger) causes X. 

The steps involved in implementing the Granger causality test are as follows. We illus¬ 
trate these steps with the GDP-money example given in Eq. (17.14.1). 

1. Regress current GDP on all lagged GDP terms and other variables, if any, but do not 
include the lagged M variables in this regression. As per Chapter 8, this is the 
restricted regression. From this regression obtain the restricted residual sum of 
squares, RSS«. 

2. Now run the regression including the lagged Mterms. In the language of Chapter 8, this 
is the unrestricted regression. From this regression obtain the unrestricted residual sum 
of squares, RSSur. 

3. The null hypothesis is Hy. a,- = 0, i = 1, 2,. . . , n, that is, lagged Mterms do not be¬ 
long in the regression. 

4. To test this hypothesis, we apply the F test given by Eq. (8.7.9), namely, 


(RSSj; - RSSqr )/ m 
RSSur/(« - k) 


( 8 . 7 . 9 ) 


which follows the F distribution with m and (n — k) df. In the present case m is equal to 
the number of lagged M terms and k is the number of parameters estimated in the unre¬ 
stricted regression. 

5. If the computed F value exceeds the critical F value at the chosen level of significance, we 
reject the null hypothesis, in which case the lagged Mterms belong in the regression. This 
is another way of saying that M causes GDP. 

6. Steps 1 to 5 can be repeated to test the model (17.14.2), that is, whether GDP causes M. 
Before we illustrate the Granger causality test, there are several things that need to be 

noted: 


1. ft is assumed that the two variables, GDP and M, are stationary. We have already dis¬ 
cussed the concept of stationarity in intuitive terms before and will discuss it more for¬ 
mally in Chapter 21. Sometimes taking the first differences of the variables makes them 
stationary, if they are not already stationary in the level form. 
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2. The number of lagged terms to be introduced in the causality tests is an important prac¬ 
tical question. As in the case of the distributed-lag models, we may have to use the Akaike 
or Schwarz information criterion to make the choice. But it should be added that the 
direction of causality may depend critically on the number of lagged terms included. 

3. We have assumed that the error terms entering the causality test are uncorrelated. If this 
is not the case, appropriate transformation, as discussed in Chapter 12, may have to be 
taken. 59 

4. Since our interest is in testing for causality, one need not present the estimated coeffi¬ 
cients of models (17.14.1) and (17.14.2) explicitly (to save space); just the results of the 
Ftest given in Eq. (8.7.9) will suffice. 

5. One has to guard against “spurious” causality. In our GDP-money example, suppose we 
consider interest rate, say the short-term interest rate. It is quite possible that money 
“Granger-causes” the interest rate and the interest rate in turn “Granger-causes” GDP. 
Therefore, if we do not account for the interest rate, and find that it is money that causes 
GDP, then, the observed causality between GDP and money may be spurious. 60 As noted 
previously, one way of dealing with this is to consider a multiple-equation system, such as 
vector autoregression (VAR), which we will discuss in some length in Chapter 22. 


EXAMPLE 17.12 

Causality between 
Money and 
Income 


R. W. Hater used the Granger test to find out the nature of causality between GNP (rather 
than GDP) and M for the United States for the period 1960-1 to 1980-IV. Instead of using 
the gross values of these variables, he used their growth rates, GNP and M, and used four 
lags of each variable in the two regressions given previously. The results were as follows: 61 
The null hypothesis in each case is that the variable under consideration does not 
"Granger-cause" the other variable. 


Direction of Causality 

F Value 

Decision 

M -r GNP 

2.68 

Reject 

GNP -► M 

0.56 

Do not reject 


These results suggest that the direction of causality is from money growth to GNP 
growth since the estimated F is significant at the 5 percent level; the critical F value is 2.50 
(for 4 and 71 df). On the other hand, there is no "reverse causation" from GNP growth to 
money growth, since the F value is statistically insignificant. 


EXAMPLE 17.13 

Causality between 
Money and 
Interest Rate in 
Canada 


Refer to the Canadian data given in Table 17.5. Suppose we want to find out if there is any 
causality between money supply and interest rate in Canada for the quarterly periods of 
1979-1988. To show that the Granger causality test depends critically on the number of 
lagged terms introduced in the model, we present below the results of the F test using 
several (quarterly) lags. In each case, the null hypothesis is that interest rate does not 
(Granger-) cause money supply and vice versa. 

( Continued ) 


59 For further details, see Wojciech W. Charemza and Derek F. Deadman, New Directions in Econometric 
Practice: General to Specific Modelling, Cointegration and Vector Autoregression, 3d ed., Edward Elgar 
Publishing, 1997, Chapter 6. 

60 On this, see J. H. Stock and M. W. Watson, "Interpreting the Evidence on Money-Income Causality," 
Journal of Econometrics, vol. 40, 1989, pp. 783-820. 

61 R. W. Hafer, "The Role of Fiscal Policy in the St. Louis Equation," Review, Federal Reserve Bank of 
St. Louis, January 1982, pp. 17-22. See his footnote 12 for the details of the procedure. 
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EXAMPLE 17.13 

Direction of Causality 

Number of Lags 

F Value 

Decision 

(' Continued) 

R-> M 

2 

12.92 

Reject 


M -* R 

2 

3.22 

Reject 


R-r M 

4 

5.59 

Reject 


M ->• R 

4 

2.45 

Reject (at 7%) 


R^r M 

6 

3.5163 

Reject 


M^r R 

6 

2.71 

Reject 


R-r M 

8 

1.40 

Do not reject 


M^r R 

8 

1.62 

Do not reject 


Note these features of the preceding results of the f test: Up to six lags, there is bilateral 
causality between money supply and interest rate. However, at eight lags, there is no 
statistically discernible relationship between the two variables. This reinforces the point 
made earlier that the outcome of the Granger test is sensitive to the number of lags intro¬ 
duced in the model. 


EXAMPLE 17.14 A study of the bilateral causality between GDP growth rate (g) and gross savings rate (s) 
Causality between showed the results given in Table 17.9. 62 For comparison, the results for the United States 
^ „/ n , are also presented in the table. By and large, the results presented in Table f 7.9 show that 

CDP Growth Rate , y r . « ■ . • 

for most East Asian countries the causality runs from the GDP growth rate to the gross 
and Gross Savings savings rate. By contrast, for the United States for the period 1950-1988 up to lag 3, 
Rate in Nine East causality ran in both directions, but for lags 4 and 5, the causality ran from the GDP 
Asian Countries growth rate to the savings rate but not the other way round. 


TABLE 17.9 

Tests of Bivariate 
Granger Causality 
between the Real 
Per Capita GDP 
Growth Rate 
and the Gross 
Savings Rate 

Source: World Bank, The 
East Asian Miracle: 
Economic Growth and 
Public Policy, Oxford 
University Press, New 

(Table A5-2). The original 
source is Robert Summers 
and Alan Heston, “The 
Penn World Tables (Mark 5): 
An Expanded Set of 
International Comparisons, 
1950-88,” Quarterly 
Journal^of Economics, 


Lagged Right-hand Lagged Right-hand 

Economy, Years Side Variable Economy, Years Side Variable 

Years of Lags Savings Growth Years of Lags Savings Growth 


Hong Kong, 1 

1960-88 2 


Indonesia, 1 

1965 2 


1950-88 2 


Malaysia, 
1955-88 


Sig 

Sig 

Sig 

Sig 

Sig 

Sig 

NS 

NS 


NS 

NS 

Sig 

NS 

NS 


Sig 

Sig 

NS 


Sig 

Sig 

Sig 

Sig 

Sig 

Sig 

Sig 

Sig 

Sig 

Sig 

Sig 

Sig 

Sig 

Sig 

Sig 

Sig 

Sig 

Sig 

Sig 

Sig 

Sig 

Sig 

NS 


Philippines, 1 

1950-88 2 


Singapore, 1 

1960-88 2 


Taiwan, China, 1 
1950-88 2 


Thailand, 1 

1950-88 2 


United States, 1 

1950-88 2 


Sig 


Sig 

Sig 

Sig 

Sig 

Sig 


NS 

Sig 

Sig 

Sig 

NS 

NS 


NS 

NS 


NS 

NS 

Sig 

Sig 

Sig 

Sig 

Sig 

Sig 

Sig 

Sig 

Sig 

Sig 

Sig 

Sig 

Sig 

Sig 

Sig 


Sig: Significant; NS: Not significant. 

Note: Growth is real per capita GDP growth at 1985 international prices. 

62 These results are obtained from The East Asian Miracle: Economic Growth and Public Policy, published 
for the World Bank by Oxford University Press, 1993, p. 244. 
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EXAMPLE 17.14 To conclude our discussion of Granger causality, keep in mind that the question we are 


(' Continued) 

examining is whether statistically one can detect the direction of causality when temporally 
there is a lead-lag relationship between two variables. If causality is established, it suggests 
that one can use a variable to better predict the other variable than simply the past history 
of that other variable. In the case of the East Asian economies, it seems that we can better 
predict the gross savings rate by considering the lagged values of the GDP growth rate 
than merely the lagged values of the gross savings rate. 


*A Note on Causality and Exogeneity 

As we will study in the chapters on simultaneous-equation models in Part 4 of this text, eco¬ 
nomic variables are often classified into two broad categories, endogenous and exogenous. 
Loosely speaking, endogenous variables are the equivalent of the dependent variable in the 
single-equation regression model and exogenous variables are the equivalent of the A vari¬ 
ables, or regressors, in such a model, provided the ,7variables are uncorrelated with the error 
term in that equation. 63 

Now we raise an interesting question: Suppose in a Granger causality test we find that 
an X variable (Granger-) causes a 7 variable without being caused by the latter (i.e., no 
bilateral causality). Can we then treat the X variable as exogenous? In other words, can we 
use Granger causality (or noncausality) to establish exogeneity? 

To answer this question, we need to distinguish three types of exogeneity: (1) weak, 
(2) strong, and (3) super. To keep the exposition simple, suppose we consider only two vari¬ 
ables, Y t and X t , and further suppose we regress Y t on X t . We say that X, is weakly exogenous 
if Y t also does not explain^. In this case estimation and testing of the regression model can 
be done, conditional on the values of X t . As a matter of fact, going hack to Chapter 2, you 
will realize that our regression modeling was conditional on the values of the X variables. 
X t is said to be strongly exogenous if current and lagged Y values do not explain it (i.e., no 
feedback relationship). And X, is super-exogenous if the parameters in the regression of Y 
on X do not change even if the X values change; that is, the parameter values are invariant 
to changes in the value(s) of X. If that is in fact the case, then, the famous “Lucas critique” 
may lose its force. 64 

The reason for distinguishing the three types of exogeneity is that, “In general, weak 
exogeneity is all that is needed for estimating and testing, strong exogeneity is necessary 
for forecasting and super exogeneity for policy analysis.” 65 

Returning to Granger causality, if a variable, say 7, does not cause another variable, say 
X, can we then assume that the latter is exogenous? Unfortunately, the answer is not 
straightforward. If we are talking about weak exogeneity, it can be shown that Granger 
causality is neither necessary nor sufficient to establish exogeneity. On the other hand. 
Granger causality is necessary (but not sufficient) for strong exogeneity. The proofs of 
these statements are beyond the scope of this book. 66 For our purpose, then, it is better to 

‘Optional. 

63 Of course, if the explanatory variables include one or more lagged terms of the endogenous 
variable, this requirement may not be fulfilled. 

64 The Nobel laureate Robert Lucas put forth the proposition that existing relations between economic 
variables may change when policy changes, in which case the estimated parameters from a 
regression model will be of little value for prediction. On this, see Oliver Blanchard, Macroeconomics, 
Prentice Hall, 1997, pp. 371-372. 

6S Keith Cuthbertson, Stephen C. Hall, and Mark P. Taylor, Applied Econometric Techniques, University 
of Michigan Press, 1992, p. 100. 

66 For a comparatively simple discussion, see G. S. Maddala, Introduction to Econometrics, 2d ed., 
Macmillan, New York, 1992, pp. 394-395, and also David F. Hendry, Dynamic Econometrics, Oxford 
University Press, New York, 1995, Chapter 5. 
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keep the concepts of Granger causality and exogeneity separate and treat the former as a 
useful descriptive tool for time series data. In Chapter 19 we will discuss a test that can be 
used to find out if a variable can be treated as exogenous. 


Summary and 
Conclusions 


1. For psychological, technological, and institutional reasons, a regressand may respond 
to a regressor(s) with a time lag. Regression models that take into account time lags are 
known as dynamic or lagged regression models. 

2. There are two types of lagged models: distributed-lag and autoregressive. In the 
former, the current and lagged values of regressors are explanatory variables. In the 
latter, the lagged value(s) of the regressand appears as an explanatory variable(s). 

3. A purely distributed-lag model can be estimated by OLS, but in that case there is the 
problem of multicollinearity since successive lagged values of a regressor tend to be 
correlated. 

4. As a result, some shortcut methods have been devised. These include the Koyck, the 
adaptive expectations, and partial adjustment mechanisms, the first being a purely 
algebraic approach and the other two being based on economic principles. 

5. Aunique feature ofthe Koyck, adaptive expectations, and partial adjustment models 
is that they all are autoregressive in nature in that the lagged value(s) of the regressand 
appears as one of the explanatory variables. 

6. Autoregressiveness poses estimation challenges; if the lagged regressand is correlated 
with the error term, OLS estimators of such models are not only biased but also are 
inconsistent. Bias and inconsistency are the case with the Koyck and the adaptive 
expectations models; the partial adjustment model is different in that it can be consis¬ 
tently estimated by OLS despite the presence of the lagged regressand. 

7. To estimate the Koyck and adaptive expectations models consistently, the most popu¬ 
lar method is the method of instrumental variable. The instrumental variable is a 
proxy variable for the lagged regressand but with the property that it is uncorrelated 
with the error term. 

8. An alternative to the lagged regression models just discussed is the Almon polynomial 
distributed-lag model, which avoids the estimation problems associated with the 
autoregressive models. The major problem with the Almon approach, however, is that 
one must prespecify both the lag length and the degree of the polynomial. There are 
both formal and informal methods of resolving the choice of the lag length and the 
degree of the polynomial. 

9. Despite the estimation problems, which can be surmounted, the distributed and 
autoregressive models have proved extremely useful in empirical economics because 
they make the otherwise static economic theory a dynamic one by taking into account 
explicitly the role of time. Such models help us to distinguish between the short- and 
the long-run responses of the dependent variable to a unit change in the value of the 
explanatory variable(s). Thus, for estimating short- and long-run price, income, 
substitution, and other elasticities these models have proved to be highly useful. 67 

10. Because of the lags involved, distributed and/or autoregressive models raise the topic 
of causality in economic variables. In applied work, Granger causality modeling has 


67 For applications of these models, see Arnold C. Harberger, ed., The Demand for Durable Goods, 
University of Chicago Press, Chicago, 1960. 
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received considerable attention. But one has to exercise great caution in using the 
Granger methodology because it is very sensitive to the lag length used in the model. 

11. Even if a variable (X) “Granger-causes” another variable ( Y ), it does not mean that X is 
exogenous. We distinguished three types of exogeneity—weak, strong, and super— 
and pointed out the importance of the distinction. 


EXERCISES Questions 


17.1. Explain with a brief reason whether the following statements are true, false, or 

uncertain: 

a. All econometric models are essentially dynamic. 

b. The Koyck model will not make much sense if some of the distributed-lag coef¬ 
ficients are positive and some are negative. 

c. If the Koyck and adaptive expectations models are estimated by OLS, the esti¬ 
mators will be biased but consistent. 

d. In the partial adjustment model, OLS estimators are biased in finite samples. 

e. In the presence of a stochastic regressor(s) and an autocorrelated error term, 
the method of instrumental variables will produce unbiased as well as consistent 
estimates. 

f In the presence of a lagged regressand as a regressor, the Durbin-Watson d sta¬ 
tistic to detect autocorrelation is practically useless. 

g. The Durbin h test is valid in both large and small samples. 

h. The Granger test is a test of precedence rather than a test of causality. 

17.2. Establish Eq. (17.7.2). 

17.3. Prove Eq. (17.8.3). 

17.4. Assume that prices are formed according to the following adaptive expectations 

hypothesis: 


P* = yP t -\ + (1 - y)P; ] 


where P* is the expected price and P the actual price. 
Complete the following table, assuming y = 0.5:* 


Period 


P* 


P 


t -3 
t-2 
t- 1 


100 


110 

125 

155 

185 


f+1 


17.5. Consider the model 


Y, = a + PxX lt + p 2 X 2t + foY t _! + v t 


‘Adapted from C. K. Shaw, op. cit., p. 26. 
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Suppose Y t _ i and v, are correlated. To remove the correlation, suppose we use the 
following instrumental variable approach: First regress Y, on X\ t and X 2t and obtain 
the estimated Y, from this regression. Then regress 

Y, = a + P\X\ t + p 2 X 2t + 1 + % 

where Y t -\ are estimated from the first-stage regression. 

a. How does this procedure remove the correlation between Y t _\ and v t in the orig¬ 
inal model? 

b. What are the advantages of the recommended procedure over the Liviatan 
approach? 

*17.6. a. Establish (17.4.8). 

b. Evaluate the median lag for X = 0.2, 0.4, 0.6, 0.8. 

c. Is there any systematic relationship between the value of X and the value of the 
median lag? 

17.7. a. Prove that for the Koyck model, the mean lag is as shown in Eq. (17.4.10). 
b. If X is relatively large, what are its implications? 

17.8. Using the formula for the mean lag given in Eq. (17.4.9), verify the mean lag of 
10.959 quarters reported in the illustration ofTable 17.1. 

17.9. Suppose 

M t — a + P\Y* + p 2 R* + u t 

where M — demand for real cash balances, Y* — expected real income, and R* — 
expected interest rate. Assume that expectations are formulated as follows: 

Y* = V iY t + (1 - Y i)Y ;_i 
R* = y 2 R, + (1 — Yi)R*-\ 

where y\ and y 2 are coefficients of expectation, both lying between 0 and 1. 

a. How would you express M, in terms of the observable quantities? 

b. What estimation problems do you foresee? 

*17.10. If you estimate Eq. (17.7.2) by OLS, can you derive estimates of the original pa¬ 
rameters? What problems do you foresee? (For details, see Roger N. Waud.) + 

17.11. Serial correlation model. Consider the following model: 

Y, — a + PX, + u t 

Assume that u t follows the Markov first-order autoregressive scheme given in 
Chapter 12, namely, 


where p is the coefficient of (first-order) autocorrelation and where e t satisfies all 
the assumptions of the classical OLS. Then, as shown in Chapter 12, the model 
Y t = a(l -p) + P(X t - pX,_0 + pY t _\ + S[ 
will have a serially independent error term, making OLS estimation possible. But 
this model, called the serial correlation model, very much resembles the Koyck, 


'Optional. 

t"Misspecification in the 'Partial Adjustment' and 'Adaptive Expectations' Models," International 
Economic Review, vol. 9, no. 2, June 1968, pp. 204-217. 
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adaptive expectations, and partial adjustment models. How would you know in any 
given situation which of the preceding models is appropriate?* 

17.12. Consider the Koyck (or, for that matter, the adaptive expectations) model given in 
Eq. (17.4.7), namely, 

Y, = «(1 - X) + p 0 X t + XY t _! + (u, - Xu t _\) 


Suppose in the original model u t follows the first-order autoregressive scheme 
u t - pui-t = e t , where p is the coefficient of autocorrelation and where e t satis¬ 
fies all the classical OLS assumptions. 

a. If p = X, can the Koyck model be estimated by OLS? 

b. Will the estimates thus obtained be unbiased? Consistent? Why or why not? 

c. How reasonable is it to assume that p = XI 

17.13. Triangular, or arithmetic, distributed-lag models This model assumes that the 
stimulus (explanatory variable) exerts its greatest impact in the current time period 
and then declines by equal decrements to zero as one goes into the distant past. 
Geometrically, it is shown in Figure 17.9. Following this distribution, suppose we 
run the following succession of regressions: 


Y t = 
Y t = 

Y t = 


^ 3X t + 2X t _\ + X t _2 ^ 

/4X t + 3X t -i + 2X,_ 2 + X t _! 

V io 


etc., and choose the regression that gives the highest R 2 as the “best” regression. 
Comment on this strategy. 


FIGURE 17.9 

Triangular or 
arithmetic lag scheme 
(Fisher’s). 


h 



0 


Tor a discussion of the serial correlation model, see Zvi Criliches, "Distributed Lags: A Survey," 
Econometrica, vol. 35, no. 1, January 1967, p. 34. 

tThis model was proposed by Irving Fisher in "Note on a Short-Cut Method for Calculating Distrib¬ 
uted Lags," International Statistical Bulletin, 1937, pp. 323-328. 








662 Part Three Topics in Econometrics 


17.14. From the quarterly data for the period 1950-1960, F. P. R. Brechling obtained the 
following demand function for labor for the British economy (the figures in paren¬ 
theses are standard errors):* 

E t = 14.22 + 0.1720, - 0.028? - 0.0007? 2 - 0.297£,_i 
(2.61) (0.014) (0.015) (0.0002) (0.033) 

R 2 = 0.76 d = 1.37 

where E t = (E, — E,_ i) 

Q = output 
t = time 


The preceding equation was based on the assumption that the desired level of em¬ 
ployment E* is a function of output, time, and time squared and on the hypothesis 
that E t — E,_\ = 8(E* — E t _\), where <5, the coefficient of adjustment, lies 
between 0 and 1. 

a. Interpret the preceding regression. 

b. What is the value of 5? 

c. Derive the long-run demand function for labor from the estimated short-run 
demand function. 

d. How would you test for serial correlation in the preceding model? 

17.15. In studying the farm demand for tractors, Griliches used the following models 

T* = 

where T* — desired stock of tractors 
X\ = relative price of tractors 
X2 = interest rate 

Using the stock adjustment model, he obtained the following results for the period 
1921-1957: 

log T, = constant - 0.218 log Xy-i - 0.855 log X 2 , t -i + 0.864 log 7)_i 
(0.051) (0.170) (0.035) 

R 2 = 0.987 

where the figures in the parentheses are the estimated standard errors. 

a. What is the estimated coefficient of adjustment? 

b. What are the short- and long-run price elasticities? 

c. What are the corresponding interest elasticities? 

d. What are the reasons for high or low rate of adjustment in the present model? 

17.16. Whenever the lagged dependent variable appears as an explanatory variable, the R 2 
is usually much higher than when it is not included. What are the reasons for this 
observation? 


*F. P. R. Brechling, "The Relationship between Output and Employment in British Manufacturing 
Industries," Review of Economic Studies, vol. 32, July 1965. 

*Zvi Griliches, "The Demand for a Durable Input: Farm Tractors in the United States, 1921-1957," in 
Arnold C. Flarberger, ed.. The Demand for Durable Goods, University of Chicago Press, Chicago, 1960. 
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FIGURE 17.10 

Hypothetical lag 
structures. 




0 Time 0 Time 

17.17. Consider the lag patterns in Figure 17.10. What degree polynomials would you fit 
to the lag structures and why? 

17.18. Consider Eq. (17.13.4): 

Pi = ao + a\i + ct2i 2 4-1 - a m i m 

To obtain the variance of Pi from the variances of a,, we use the following formula: 


var (Pi) — var(a 0 + a\i + a 2 i 2 H-1- a m i m ) 

— ^ z 2/ var (<2 ; ) + 2 ^ i {j+p) cov ( aja p ) 
j =o j<p 

a. Using the preceding formula, find the variance of /§, expressed as 

Pi = a 0 + a\i +a 2 i 2 
Pi = ao + a\i + a 2 i 2 + 


b. If the variances of a, are large relative to themselves, will the variance of /§, be 
large also? Why or why not? 

17.19. Consider the following distributed-lag model: 


Y,=a + p 0 X, + p x X t _ x + p 2 X t _ 2 + p 3 X t _ 3 + P*Xt - 4 + u, 
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FIGURE 17.11 

Inverted V distributed- 
lag model. 



Assume that p t can be adequately expressed by the second-degree polynomial as 
follows: 

Pi = ao + a\i + a 2 i 2 

How would you estimate the /0’s if we want to impose the restriction that 

Po = Pq — 0? 

17.20. The inverted V distributed-lag model. Consider the A>period finite distributed-lag 
model 

Y t — a + p 0 X, + p x X t _ x + p 2 X t _ 2 + ■ ■ ■ + PkXt-k + u t 

F. DeLeeuw has proposed the structure for the P’s as in Figure 17.11, where the P’s 
follow the inverted V shape. Assuming for simplicity that k (the maximum length 
of the lag) is an even number, and further assuming that p 0 and p k are zero, 
DeLeeuw suggests the following scheme for the P’s:* 

Pi = ip o < i < k - 

~{k- i)P ^<i <k 

How would you use the DeLeeuw scheme to estimate the parameters of the pre¬ 
ceding ^-period distributed-lag model? 

17.21. Refer to Exercise 12.15. Since the d value shown there is of little use in detecting 
(first-order) autocorrelation (why?), how would you test for autocorrelation in this 
case? 

Empirical Exercises 

17.22. Consider the following model: 

Y* — a + PoX t + Ut 

where Y* = desired, or long-run, business expenditure for new plant and equip¬ 
ment, X t = sales, and t = time. Using the stock adjustment model, estimate the 


*See his article, "The Demand for Capital Goods by Manufacturers: A Study of Quarterly Time Series," 
Econometrica, vol. 30, no. 3, July 1962, pp. 407-423. 
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TABLE 17.10 

Investment in Fixed 
Plant and Equipment 
in Manufacturing Y 
and Manufacturing 
Sales X 2 in Billions of 
Dollars, Seasonally 
Adjusted, 

United States, 
1970-1991 

the President, 1993. Data on 
Y from Table B-52, p. 407; 
data onX 2 from Table 8-53, 
p. 408. 


Year 

Plant Expenditure, Y Sales, X 2 

Year 

Plant Expenditure, Y 

Sales, X 2 

1970 

36.99 

52.805 

1981 

128.68 

168.129 

1971 

33.60 

55.906 

1982 

123.97 

163.351 

1972 

35.42 

63.027 

1983 

117.35 

172.547 

1973 

42.35 

72.931 

1984 

139.61 

190.682 

1974 

52.48 

84.790 

1985 

152.88 

194.538 

1975 

53.66 

86.589 

1986 

137.95 

194.657 

1976 

58.53 

98.797 

1987 

141.06 

206.326 

1977 

67.48 

113.201 

1988 

163.45 

223.541 

1978 

78.13 

126.905 

1989 

183.80 

232.724 

1979 

95.13 

143.936 

1990 

192.61 

239.459 

1980 

112.60 

154.391 

1991 

182.81 

235.142 


parameters of the long- and short-run demand function for expenditure on new 
plant and equipment given in Table 17.10. 

How would you find out if there is serial correlation in the data? 

17.23. Use the data of Exercise 17.22 but consider the following model: 

Y* = PoXf'e"' 

Using the stock adjustment model (why?), estimate the short- and long-run elastic¬ 
ities of expenditure on new plant and equipment with respect to sales. Compare 
your results with those for Exercise 17.22. Which model would you choose and 
why? Is there serial correlation in the data? How do you know? 

17.24. Use the data of Exercise 17.22 but assume that 

H = a * pX* t + u, 

where X* are the desired sales. Estimate the parameters of this model and compare 
the results with those obtained in Exercise 17.22. How would you decide which is 
the appropriate model? On the basis of the h statistic, would you conclude there 
is serial correlation in the data? 

17.25. Suppose someone convinces you that the relationship between business expendi¬ 
ture for new plant and equipment and sales is as follows: 

Y* = « + PX* + u t 

where Y* is desired expenditure and X* is desired or expected sales. Use the data 
given in Exercise 17.22 to estimate this model and comment on your results. 

17.26. Using the data given in Exercise 17.22, determine whether plant expenditure 
Granger-causes sales or whether sales Granger-cause plant expenditure. Use up to 
six lags and comment on your results. What important conclusion do you draw 
from this exercise? 

17.27. Assume that sales in Exercise 17.22 has a distributed-lag effect on expenditure on 
plant and equipment. Fit a suitable Almon lag model to the data. 

17.28. Reestimate Eq. (17.13.16) imposing (1) near-end restriction, (2) far-end restriction, 
and (3) both end restrictions and compare your results given in Eq. (17.13.16). 
What general conclusion do you draw? 
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TABLE 17.11 Investments, Sales, and Interest Rate, United States, 1960-1999 


Observation 

Investment 

Sales 

Interest 

Observation 

Investment 

Sales 

Interest 

1960 

4.9 

60,827 

4.41 

1980 

69.6 

327,233 

11.94 

1961 

5.2 

61,159 

4.35 

1981 

82.4 

355,822 

14.17 

1962 

5.7 

65,662 

4.33 

1982 

88.9 

347,625 

13.79 

1963 

6.5 

68,995 

4.26 

1983 

100.8 

369,286 

12.04 

1964 

7.3 

73,682 

4.40 

1984 

121.7 

410,124 

12.71 

1965 

8.5 

80,283 

4.49 

1985 

130.8 

422,583 

11.37 

1966 

10.6 

87,187 

5.13 

1986 

137.6 

430,419 

9.02 

1967 

11.2 

90,820 

5.51 

1987 

141.9 

457,735 

9.38 

1968 

11.9 

96,685 

6.18 

1988 

155.9 

497,157 

9.71 

1969 

14.6 

105,690 

7.03 

1989 

173.0 

527,039 

9.26 

1970 

16.7 

108,221 

8.04 

1990 

176.1 

545,909 

9.32 

1971 

17.3 

116,895 

7.39 

1991 

181.4 

542,815 

8.77 

1972 

19.3 

131,081 

7.21 

1992 

197.5 

567,176 

8.14 

1973 

23.0 

153,677 

7.44 

1993 

215.0 

595,628 

7.22 

1974 

26.8 

177,912 

8.57 

1994 

233.7 

639,163 

7.96 

1975 

28.2 

182,198 

8.83 

1995 

262.0 

684,982 

7.59 

1976 

32.4 

204,150 

8.43 

1996 

287.3 

718,113 

7.37 

1977 

38.6 

229,513 

8.02 

1997 

325.2 

753,445 

7.26 

1978 

48.3 

260,320 

8.73 

1998 

367.4 

779,41 3 

6.53 

1979 

58.6 

297,701 

9.63 

1999 

433.0 

833,079 

7.04 


Notes: Investment = private fixed investment in information processing equipment and software, billions of dollars, seasonally adjusted. 
Sales = sales in total manufacturing and trade, millions of dollars, seasonally adjusted. 

Interest = Moody’s Aaa corporate bond rate, %. 

Source: Economic Report of the President, 2001, Tables B-18, B-57, and B-73. 


17.29. Table 17.11 gives data on private fixed investment in information processing and 
equipment (Y, in billions of dollars), sales in total manufacturing and trade (X 2 , in 
millions of dollars), and interest rate (X 3 , Moody’s Aaa corporate bond rate, per¬ 
cent); data on Y and X 2 are seasonally adjusted. 

a. Test for bilateral causality between Y and X 2 , paying careful attention to the lag 
length. 

b. Test for bilateral causality between Y and X 2 , again paying careful attention to the lag 
length. 

c. To allow for the distributed lag effect of sales on investment, suppose you decide 
to use the Almon lag technique. Show the estimated model, after paying due at¬ 
tention to the length of the lag as well as the degree of the polynomial. 

17.30. Table 17.12 gives data on indexes of real compensation per hour (Y) and output per 
hour (X2), with both indexes to base 1992 = 100, in the business sector of the U.S. 
economy for the period 1960-1999, as well as the civilian unemployment rate (A3) 
for the same period. 

a. How would you decide whether it is wage compensation that determines labor 
productivity or the other way round? 

b. Develop a suitable model to test your conjecture in (a), providing the usual statistics. 

c. Do you think the unemployment rate has any effect on wage compensation, and if 
so, how would you take that into account? Show the necessary statistical analysis. 
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TABLE 17.12 Compensation, Productivity and Unemployment Rate, United States, 1960-1999 


Observation COMP 

1960 60.0 

1961 61.8 

1962 63.9 

1963 65.4 

1964 67.9 

1965 69.4 

1966 71.9 

1967 73.8 

1968 76.3 

1969 77.4 

1970 78.9 

1971 80.4 

1972 82.7 

1973 84.5 

1974 83.5 

1975 84.4 

1976 86.8 

1977 87.9 

1978 89.5 

1979 89.7 


PRODUCT UNRate 

48.8 5.5 

50.6 6.7 

52.9 5.5 

55.0 5.7 

57.5 5.2 

59.6 4.5 

62.0 3.8 

63.4 3.8 

65.4 3.6 

65.7 3.5 

67.0 4.9 

69.9 5.9 

72.2 5.6 

74.5 4.9 

73.2 5.6 

75.8 8.5 

78.5 7.7 

79.8 7.1 

80.7 6.1 

80.7 5.8 


Observation COMP 

1980 89.5 

1981 89.5 

1982 90.9 

1983 91.0 

1984 91.3 

1985 92.7 

1986 95.8 

1987 96.3 

1988 97.3 

1989 95.9 

1990 96.5 

1991 97.5 

1992 100.0 

1993 99.9 

1994 99.7 

1995 99.3 

1996 99.7 

1997 100.4 

1998 104.3 

1999 107.3 


PRODUCT UNRate 

80.4 7.1 

82.0 7.6 

81.7 9.7 

84.6 9.6 

87.0 7.5 

88.7 7.2 

91.4 7.0 

91.9 6.2 

93.0 5.5 

93.9 5.3 

95.2 5.6 

96.3 6.8 

100.0 7.5 

100.5 6.9 

101.9 6.1 

102.6 5.6 

105.4 5.4 

107.6 4.9 

110.5 4.5 

114.0 4.2 


Notes: COMP = index of real compensation per hour (1992 = 100). 

PRODUCT = index of output per hour (1992 = 100). 

UNRate = civilian unemployment rate, %. 

Source: Economic Report of the President, 2001, Table B-49, p. 332. 


17.31. In a test of Granger causality, Christopher Sims exploits the fact that the future 
cannot cause the present.* To decide whether a variable Y causes a variable X, Sims 
suggests estimating the following pair of equations: 

Y, = «i + 2 PtX t -i + 2 n Yt-i + 2 X ‘ X ‘+i + u u (1) 

X t = a 2 + J2 Mk* + 2 9i Yt ~ l + 2 ** Yt+i + U2 ‘ ( 2 ) 

pa i=i isj 

These regressions include the lagged, current, and future, or lead, values of the 
regressors; terms such asX (+ i, X t+ 2, etc., are called lead terms. 

If Y is to Granger-cause X, then there must be some relationship between Y and 
the lead, or future, values ofX Therefore, instead of testing that E/1, = 0, we should 
test EA.,- = 0 in Eq. (1). If we reject this hypothesis, the causality then runs from Y 
to X, and not from X to Y, because the future cannot cause the present. Similar com¬ 
ments apply to Equation (2). 


*C. A. Sims, "Money, Income, and Causality," American Economic Review, vol. 62, 1972, 
pp. 540-552. 
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TABLE 17.13 
Macroeconomic Data 
for the Greek 
Economy, 1960-1995 

Source: H. R. Seddighi, K. A. 
Lawler, andA. V Katos, 
Econometrics: A Practical 
Approach, Routledge, 
London, 2000, p. 158. 


Year PC 

1960 107808 

1961 115147 

1962 120050 

1963 126115 

1964 137192 

1965 147707 

1966 157687 

1967 167528 

1968 179025 

1969 190089 

1970 206813 

1971 217212 

1972 232312 

1973 250057 

1974 251650 

1975 266884 

1976 281066 

1977 293928 

1978 310640 

1979 318817 

1980 319341 

1981 325851 

1982 338507 

1983 339425 

1984 345194 

1985 358671 

1986 361026 

1987 365473 

1988 378488 

1989 394942 

1990 403194 

1991 412458 

1992 420028 

1993 420585 

1994 426893 

1995 433723 


PDI Grossinv 

117179 29121 

127599 31476 

135007 34128 

142128 35996 

159649 43445 

172756 49003 

182366 50567 

195611 49770 

204470 60397 

222638 71653 

246819 70663 

269249 80558 

297266 92977 

335522 100093 

310231 74500 

327521 74660 

350427 79750 

366730 85950 

390189 91100 

406857 99121 

401942 92705 

419669 85750 

421716 84100 

417930 83000 

434696 78300 

456576 82360 

439654 77234 

438454 73315 

476345 79831 

492334 87873 

495939 96139 

513173 91726 

502520 93140 

523066 91292 

520728 93073 

518407 98470 


GNP LTI 

145458 8 

161802 8 

164674 8 

181534 8.25 

196586 9 

214922 9 

228040 9 

240791 9 

257226 8.75 

282168 8 

304420 8 

327723 8 

356886 8 

383916 9 

369325 11.83 

390000 11.88 

415491 11.5 

431164 12 

458675 13.46 

476048 16.71 

485108 21.25 

484259 21.33 

483879 20.5 

481198 20.5 

490881 20.5 

502258 20.5 

507199 20.5 

505713 21.82 

529460 22.89 

546572 23.26 

546982 27.62 

566586 29.45 

568582 28.71 

569724 28.56 

579846 27.44 

588691 23.05 


Note: All nominal data are expressed at constant market prices of year 1970 in millions of drachmas. Private disposable income is deflated 
by the consumption price deflator. 


To carry out the Sims test, we estimate Eq. (1) without the lead terms (call it 
restricted regression ) and then estimate Eq. (1) with the lead terms (call it 
unrestricted regression ). Then we carry out the F test as indicated in Equation 
(8.7.9). If the F statistic is significant (say, at the 5% level), then we conclude that 
it is Y that Granger-causes X. Similar comments apply to Equation (2). 

Which test do we choose—Granger or Sims? We can apply both tests.* The one 
factor that is in favor of the Granger test is that it uses fewer degrees of freedom 

‘The choice between Granger and Sims causality tests is not clear. For further discussion of these 
tests, see G. Chamberlain, "The General Equivalence of Granger and Sims Causality," Econometrica, 
vol. 50, 1982, pp. 569-582. 
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because it does not use the lead terms. If the sample is not sufficiently large, we will 
have to use the Sims test cautiously. 

Refer to the data given in Exercise 12.34. For pedagogical purposes, apply the 
Sims test of causality to determine whether it is sales that causes plant expenditure 
or vice versa. Use the last four years’ data as the lead terms in your analysis. 

17.32. Table 17.13 gives some macroeconomic data for the Greek economy for the years 
1960-1995. 

Consider the following consumption function: 

In PC* - fii + p 2 lnPDI, + &LTI r + u t 

Where PC * = real desired private consumption expenditure at time t; PDI, = real 
private disposable income at time t; LTI, = long-term interest rate at time t; and In 
stands for natural logarithm. 

a. From the data given in Table 17.13, estimate the previous consumption func¬ 
tion, stating clearly how you measured the real desired private consumption 
expenditure. 

b. What econometric problems did you encounter in estimating the preceding con¬ 
sumption function? How did you resolve them? Explain fully. 

17.33. Using the data in Table 17.13, develop a suitable model to explain the behavior of 
gross real investment in the Greek economy for the period 1960-1995. Look up any 
textbook on macroeconomics for the accelerator model of investment. 


Appendix 17A 


17A.1 The Sargan Test for the Validity of Instruments 

Suppose we use an instrumental variable(s) to replace an explanatory variable(s) that is correlated 
with the error term. How valid is the instrumental variable(s), that is, how do we know that the in¬ 
struments chosen are independent of the error term? Sargan has developed a statistic, dubbed SARG, 
to test the validity of the instruments used in instrumental variable(s) (IV).* The steps involved in 
SARG are as follows: + 

1. Divide the variables included in a regression equation into two groups, those that are independent 
of the error term (say, X\, X2, . . . , X p ) and those that are not independent of the error term (say, 
Z h Z 2 , . . ., Z q ). 

2. Let W\, W2, . . ., W s be the instruments chosen for the Z variables in 1, where s > q. 

3. Estimate the original regression, replacing the Z’s by the W s, that is, estimate the original 
regression by IV and obtain the residuals, say, u. 

4. Regress u on a constant, all the X variables and all the W variables but exclude all the Z variables. 
Obtain R 2 from this regression. 

5. Now compute the SARG statistic, defined as: 

SARG = (n- k)R 2 ~ x s % (17A.1.1) 


‘j. D. Sargan, "Wages and Prices in the United Kingdom: A Study in Econometric Methodology," 
in P. E. Hart, C. Mills, and J. K. Whitaker (eds.) Econometric Analysis for National Economic Planning, 
Butterworths, London, 1964. 

Hhe following discussion leans on H. R. Seddighi, K. A. Lawler, and A. V. Katos, Econometrics: 

A Practical Approach, Routledge, New York, 2000, pp. 155-156. 
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Where n = the number of observations and k is the number of coefficients in the original 
regression equation. Under the null hypothesis that the instruments are exogenous, Sargan has 
shown the SARG test asymptotically has the y 2 distribution with (s — q) degrees of freedom, 
where s is the number of instruments (i.e., the variables in W) and q is the number of regressors in 
the original equation. If the computed chi-square value in an application is statistically significant, 
we reject the validity of the instruments. If it is not statistically significant, we can accept the 
chosen instrument as valid. It should be emphasized that s > q, that is, the number of instruments 
must be greater than q. If that is not the case (i.e., s < q), the SARG test is not valid. 

6. The null hypothesis is that all (W) instruments are valid. If the computed chi-square exceeds the 
critical chi-square value, we reject the null hypothesis, which means that at least one instrument 
is correlated with the error term and therefore the IV estimates based on the chosen instruments 
are not valid. 



Simultaneous- 
Equation Models 
and Time Series 
Econometrics 



A casual look at the published empirical work in business and economics will reveal that 
many economic relationships are of the single-equation type. That is why we devoted the 
first three parts of this book to the discussion of single-equation regression models. In such 
models, one variable (the dependent variable Y ) is expressed as a linear function of one or 
more other variables (the explanatory variables, the X’s). In such models an implicit 
assumption is that the cause-and-effect relationship, if any, between Y and the X’s is unidi¬ 
rectional: The explanatory variables are the cause and the dependent variable is the effect. 

However, there are situations where there is a two-way flow of influence among economic 
variables; that is, one economic variable affects another economic variable(s) and is, in turn, 
affected by it (them). Thus, in the regression of money M on the rate of interest r, the single¬ 
equation methodology assumes implicitly that the rate of interest is fixed (say, by the Federal 
Reserve System) and tries to find out the response of money demanded to the changes in the 
level of the interest rate. But what happens if the rate of interest depends on the demand for 
money? In this case, the conditional regression analysis made in this hook thus far may 
not be appropriate because now M depends on r and r depends on M. Thus, we need to 
consider two equations, one relating M to r and another relating r to M. And this leads us 
to consider simultaneous-equation models, models in which there is more than one regres¬ 
sion equation, one for each interdependent variable. 

In Part 4 we present a very elementary and often heuristic introduction to the complex 
subject of simultaneous-equation models, the details being left for the references. 

In Chapter 18, we provide several examples of simultaneous-equation models and show 
why the method of ordinary least squares considered previously is generally inapplicable in 
estimating the parameters of each of the equations in the model. 

In Chapter 19, we consider the so-called identification problem. If in a system of si¬ 
multaneous equations containing two or more equations it is not possible to obtain numer¬ 
ical values of each parameter in each equation because the equations are observationally 
indistinguishable, or look too much like one another, then we have the identification 
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problem. Thus, in the regression of quantity Q on price P, is the resulting equation a de¬ 
mand function or a supply function (for Q and P enter into both functions)? Therefore, if 
we have data on Q and P only and no other information, it will be difficult if not impossi¬ 
ble to identify the regression as the demand or supply function. It is essential to resolve the 
identification problem before we proceed to estimation because if we do not know what we 
are estimating, estimation per se is meaningless. In Chapter 19 we offer various methods of 
solving the identification problem. 

In Chapter 20, we consider several estimation methods that are designed specifically for 
estimating the simultaneous-equation models and consider their merits and limitations. 


Chapter 


18 


Simultaneous-Equation 



In this and the following two chapters we discuss the simultaneous-equation models. In 
particular, we discuss their special features, their estimation, and some of the statistical 
problems associated with them. 


18.1 The Nature of Simultaneous-Equation Models 


In Parts 1 to 3 of this text we were concerned exclusively with single-equation models, i.e., 
models in which there was a single dependent variable Y and one or more explanatory vari¬ 
ables, the X’s. In such models the emphasis was on estimating and/or predicting the aver¬ 
age value of Y conditional upon the fixed values of the X variables. The cause-and-effect 
relationship, if any, in such models therefore ran from the X’s to the Y. 

But in many situations, such a one-way or unidirectional cause-and-effect relationship is 
not meaningful. This occurs if Y is determined by the X’s, and some of the X’s are, in turn, 
determined by Y. In short, there is a two-way, or simultaneous, relationship between Y and 
(some of) the X’s, which makes the distinction between dependent and explanatory vari¬ 
ables of dubious value. It is better to lump together a set of variables that can be determined 
simultaneously by the remaining set of variables—precisely what is done in simultaneous- 
equation models. In such models there is more than one equation—one for each of the 
mutually, or jointly, dependent or endogenous variables. 1 And unlike the single-equation 
models, in the simultaneous-equation models one may not estimate the parameters of a 
single equation without taking into account information provided by other equations in the 
system. 

What happens if the parameters of each equation are estimated by applying, say, the 
method of ordinary least squares (OLS), disregarding other equations in the system? Recall 
that one of the crucial assumptions of the method of OLS is that the explanatory X variables 
are either nonstochastic or, if stochastic (random), distributed independently of the sto¬ 
chastic disturbance term. If neither of these conditions is met, then, as shown later, the 
least-squares estimators are not only biased but also inconsistent; that is, as the sample size 


fin the context of the simultaneous-equation models, the jointly dependent variables are called 
endogenous variables and the variables that are truly nonstochastic or can be so regarded are 
called the exogenous, or predetermined, variables. (More on this in Chapter 19.) 
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increases indefinitely, the estimators do not converge to their true (population) values. 
Thus, in the following hypothetical system of equations, 2 

Ym = p w + PuY 2i + YnXu + u u (18.1.1) 

Y 2i = P20 + Pix Y u + yzxXu + u 2i (18.1.2) 

where Y\ and Y2 are mutually dependent, or endogenous, variables and X\ is an exogenous 
variable and where u\ and 112 are the stochastic disturbance terms, the variables Y\ and Y2 
are both stochastic. Therefore, unless it can be shown that the stochastic explanatory vari¬ 
able Y2 in Eq. (18.1.1) is distributed independently of Mi and the stochastic explanatory 
variable Y\ in Eq. (18.1.2) is distributed independently of M2, application of the classical 
OLS to these equations individually will lead to inconsistent estimates. 

In the remainder of this chapter we give a few examples of simultaneous-equation mod¬ 
els and show the bias involved in the direct application of the least-squares method to such 
models. After discussing the so-called identification problem in Chapter 19, in Chapter 20 
we discuss some of the special methods developed to handle the simultaneous-equation 
models. 

18.2 Examples of Simultaneous-Equation Models 


EXAMPLE 18.1 

Demand-and- 
Supply Model 


As is well known, the price P of a commodity and the quantity Q sold are determined by 
the intersection of the demand-and-supply curves for that commodity. Thus, assuming for 
simplicity that the demand-and-supply curves are linear and adding the stochastic distur¬ 
bance terms iq and U2, we may write the empirical demand-and-supply functions as: 


Demand function: 
Supply function: 


Q? = ao + otx Pt + uu oq < 0 (18.2.1) 

Q s t=Po + PxPt + u 2t Px>0 (18.2.2) 


Equilibrium condition: Qjf = Q t s 


where Q d = quantity demanded 
Q s = quantity supplied 
t — time 


and the a's and p's are the parameters. A priori, aq is expected to be negative (down¬ 
ward-sloping demand curve), and p-\ is expected to be positive (upward-sloping supply 
curve). 

Now it is not too difficult to see that P and Q are jointly dependent variables. If, for 
example, <q t in Eq. (18.2.1) changes because of changes in other variables affecting Qf 
(such as income, wealth, and tastes), the demand curve will shift upward if uu is positive 
and downward if uu is negative. These shifts are shown in Figure 18.1. 

As the figure shows, a shift in the demand curve changes both P and Q. Similarly, a 
change in U2t (because of strikes, weather, import or export restrictions, etc.) will shift 
the supply curve, again affecting both Pand Q. Because of this simultaneous dependence 
between Q and P, Uu and P t in Eq. (18.2.1) and U2t and P t in Eq. (18.2.2) cannot be 
independent. Therefore, a regression of Q on P as in Eq. (18.2.1) would violate an 
important assumption of the classical linear regression model, namely, the assumption of 
no correlation between the explanatory variable(s) and the disturbance term. 


2 These economical but self-explanatory notations will be generalized to more than two equations in 
Chapter 19. 
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EXAMPLE 18.2 

Keynesian Model 
of Income 
Determination 


Consider the simple Keynesian model of income determination: 

Consumption function: C t = p 0 + PiY t + u t 0 < /Si < 1 (18.2.3) 
Income identity: Y t = C t + lt(= S t ) (18.2.4) 


where C = consumption expenditure 

Y= income 

/ = investment (assumed exogenous) 
5 = savings 
t = time 


u = stochastic disturbance term 
/Sq and /Si = parameters 


The parameter /Si is known as the marginal propensity to consume (MPC) (the amount 
of extra consumption expenditure resulting from an extra dollar of income). From eco¬ 
nomic theory, /Si is expected to lie between 0 and 1. Equation (18.2.3) is the (stochastic) 
consumption function; and Eq. (18.2.4) is the national income identity, signifying that total 
income is equal to total consumption expenditure plus total investment expenditure, it 

( Continued ) 
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EXAMPLE 18.2 being understood that total investment expenditure is equal to total savings. Diagrammat- 

(Continued ) ical| y' we have Fi 9 ure 18Z 

From the postulated consumption function and Figure 18.2 it is clear that C and Y 
are interdependent and that Y t in Eq. (18.2.3) is not expected to be independent of the 
disturbance term because when u t shifts (because of a variety of factors subsumed in the 
error term), then the consumption function also shifts, which, in turn, affects Y t . Therefore, 
once again the classical least-squares method is inapplicable to Eq. (18.2.3). If applied, the 
estimators thus obtained will be inconsistent, as we shall show later. 



Consider the following Phillips-type model of money-wage and price determination: 

Wt = ao + a! UNt + ot2Pt + wit (18.2.5) 

P t = p 0 + fa W t + p 2 Rt + ft M t + u 2t (18.2.6) 

where W = rate of change of money wages 
UN = unemployment rate, % 

P = rate of change of prices 
R = rate of change of cost of capital 
M = rate of change of price of imported raw material 
t = time 

u h U2 = stochastic disturbances 

Since the price variable P enters into the wage equation and the wage variable W enters 
into the price equation, the two variables are jointly dependent. Therefore, these stochas¬ 
tic explanatory variables are expected to be correlated with the relevant stochastic distur¬ 
bances, once again rendering the classical OLS method inapplicable to estimate the 
parameters of the two equations individually. 


EXAMPLE 18.3 

Wage—Price 
Models 
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EXAMPLE 18.4 

The IS Model of 
Macroeconomics 


FIGURE 18.3 

The IS curve. 


The celebrated IS, or goods market equilibrium, model of macroeconomics 3 in its non¬ 
stochastic form can be expressed as: 


Consumption function: 

Q = P o + Pi Ydt 0 < < 1 

(18.2.7) 

Tax function: 

H*ao + a\Y t 0 < aq < 1 

(18.2.8) 

Investment function: 

lt = yo + yi D 

(18.2.9) 

Definition: 

Y dt =Y t -T t 

(18.2.10) 

Government expenditure: 

G t = G 

(18.2.11) 

National income identity: 

Y t = Ct+lt + G t 

(18.2.12) 


where Y = national income 

C = consumption spending 
/ = planned or desired net investment 
C = given level of government expenditure 
T = taxes 

Yd = disposable income 
r= interest rate 


If you substitute Eqs. (18.2.10) and (18.2.8) into Eq. (18.2.7) and substitute the result¬ 
ing equation for Cand Eqs. (18.2.9) and (18.2.11) into Eq. (18.2.12), you should obtain 
the IS equation: 


where 


Yt = Tro + 7T1 r t 
Po - otoPt +yo + G 

W#== l-A(l-«,) 

1 


(18.2.13) 

(18.2.14) 


Equation (18.2.13) is the equation of the IS, or goods market equilibrium, that is, it gives 
the combinations of the interest rate and level of income such that the goods market 
clears or is in equilibrium. Geometrically, the IS curve is shown in Figure 18.3. 



3 "The goods market equilibrium schedule, or IS schedule, shows combinations of interest rates and 
levels of output such that planned spending equals income." See Rudiger Dornbusch and Stanley 
Fischer, Macroeconomics, 3d ed., McGraw-Hill, New York, 1984, p. 102. Note that for simplicity we 
have assumed away the foreign trade sector. 
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EXAMPLE 18.4 

(Continued) 


What would happen if we were to estimate, say, the consumption function (18.2.7) in 
isolation? Could we obtain unbiased and/or consistent estimates of /3 0 and /h? Such a 
result is unlikely because consumption depends on disposable income, which depends on 
national income Y, but the latter depends on r and C as well as the other parameters 
entering in no- Therefore, unless we take into account all these influences, a simple 
regression of C on Yd is bound to give biased and/or inconsistent estimates of fio and /Si. 


EXAMPLE 18.5 The other half of the famous IS-LM paradigm is the LM, or money market equilibrium, re- 
The LM Model lation, which gives the combinations of the interest rate and level of income such that the 
money market is cleared, that is, the demand for money is equal to its supply. Alge¬ 
braically, the model, in the nonstochastic form, may be expressed as: 

Money demand function: Mf = a + bY t - cr t (18.2.15) 

Money supply function: M s t = M (18.2.16) 

Equilibrium condition: Mf = Mf (18.2.17) 

where Y = income, r = interest rate, and M = assumed level of money supply, say, 

determined by the Fed. 

Equating the money demand and supply functions and simplifying, we obtain the LM 
equation: 

Tf = Xo + 7.1 M + X2Ct (18.2.18) 

where 

T-o = -a/b 

hymt/b (18.2.19) 

t -2 = c/b 

For a given M — M, the LM curve representing the relation (18.2.18) is as shown in 
Figure 18.4. 

The IS and LM curves show, respectively, that a whole array of interest rates is consis¬ 
tent with goods market equilibrium and a whole array of interest rates is compatible with 
equilibrium in the money market. Of course, only one interest rate and one level of 
income will be consistent simultaneously with the two equilibria. To obtain these, all that 
needs to be done is to equate Eqs. (18.2.13) and (18.2.18). In Exercise 18.4 you are asked 
to show the level of the interest rate and income that is simultaneously compatible with 
the goods and money market equilibrium. 


FIGURE 18.4 

The LM curve. 


LM(M = M) 
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EXAMPLE 18.6 An extensive use of simultaneous-equation models has been made in the econometric 

Econometric models built by several econometricians. An early pioneer in this field was Professor 

y, i , Lawrence Klein of the Wharton School of the University of Pennsylvania. His initial model, 

° e s known as Klein's model I, is as follows: 

Consumption function: C t = Po + Pi p t + Pi{ IV + W') t + fa P t _i + iq t 

Investment function: h = P 4 + Ps Pt + p6 Pt- 1 + P7 K t 1 + Uzt 

Demand for labor: W t = p 8 + Pg(Y + T - W') t 

+ Pio(Y + T- W') t _ 1 +Pnt + ust (18.2.20) 

Identity: Y t + T t = C t + It + C t 


Identity: Y t = W{ +W t + P t 

Identity: K t = K t _i + l t 


where 


C = 
P = 
W = 
W’ = 
K = 
T = 
Y = 


ui, U2, and U3 s= 


consumption expenditure 
investment expenditure 
government expenditure 
profits 

private wage bill 
government wage bill 
capital stock 
taxes 

income after tax 
time 

stochastic disturbances * * * 4 


In the preceding model the variables C, l,W,Y,P, and K are treated as jointly dependent, 
or endogenous, variables and the variables P t ~ 1, K t - 1, and y t _ 1 are treated as predeter¬ 
mined. 5 In all, there are six equations (including the three identities) to study the interde¬ 
pendence of six endogenous variables. 

In Chapter 20 we shall see how such econometric models are estimated. For the time 
being, note that because of the interdependence among the endogenous variables, in 
general they are not independent of the stochastic disturbance terms, which therefore 
makes it inappropriate to apply the method of OLS to an individual equation in the sys¬ 
tem. As shown in Section 18.3, the estimators thus obtained are inconsistent; they do not 
converge to their true population values even when the sample size is very large. 


18.3 The Simultaneous-Equation Bias: 

Inconsistency of OLS Estimators 

As stated previously, the method of least squares may not be applied to estimate a single 
equation embedded in a system of simultaneous equations if one or more of the explana¬ 

tory variables are correlated with the disturbance term in that equation because the estima¬ 

tors thus obtained are inconsistent. To show this, let us revert to the simple Keynesian 


4 L. R. Klein, Economic Fluctuations in the United States, 1921-1941, John Wiley & Sons, New York, 1950. 

5 The model builder will have to specify which of the variables in a model are endogenous and which 
are predetermined. K t -i and Y t -i are predetermined because at time t their values are known. (More 
on this in Chapter 19.) 
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model of income determination given in Example 18.2. Suppose that we want to estimate 
the parameters of the consumption function (18.2.3). Assuming that E(u,) = 0, 
E(u\) = a 1 , E(u t u t +j) = 0 (for j ^ 0), and caw (I,, u t ) = 0, which are the assumptions of 
the classical linear regression model, we first show that Y, and u t in (18.2.3) are correlated 
and then prove that P\ is an inconsistent estimator of Pi. 

To prove that Y, and u, are correlated, we proceed as follows. Substitute Eq. (18.2.3) into 


Eq. (18.2.4) to obtain 

Y t = p 0 + P\Y t +u t + I, 


that is, 

Yt = ——-1--— I t H--— u t 

( 18 . 3 . 1 ) 


l-Pi l-Pi 1 -Pi 

Now 


( 18 . 3 . 2 ) 


where use is made of the fact that E(u t ) — 0 and that I t being exogenous, or predetermined 
(because it is fixed in advance), has as its expected value I,. 

Therefore, subtracting Eq. (18.3.2) from Eq. (18.3.1) results in 


Moreover, 


whence 


Y t -E{Y t )=-\ ( 18 . 3 . 3 ) 

1 - Pi 

u,-E(u t ) = u t (Why?) ( 18 . 3 . 4 ) 


co v(Y„ u t ) = E[Y, - E(Y t )][u, - E{u,)] 

E(u 2 ) 

= from Eqs. (18.3.3) and (18.3.4) ( 18 . 3 . 5 ) 

1 - Pi 
a 2 

= 1-A 

Since a 2 is positive by assumption (why?), the covariance between Y and u given in 
Eq. (18.3.5) is bound to be different from zero. 6 As a result, Y, and u, inEq. (18.2.3) are ex¬ 
pected to be correlated, which violates the assumption of the classical linear regression 
model that the disturbances are independent or at least uncorrelated with the explanatory 
variables. As noted previously, the OLS estimators in this situation are inconsistent. 

To show that the OLS estimator Pi is an inconsistent estimator of P\ because of corre¬ 
lation between Y t and u,, we proceed as follows: 


_ Etc, - cm - j | 
ECU - y ) 2 

_ T. c tyt 
_ T.C,y t 

Eyf 


( 18 . 3 . 6 ) 


6 lt will be greater than zero as long as Pi, the MPC, lies between 0 and 1, and it will be negative if Pi 
is greater than unity. Of course, a value of MPC greater than unity would not make much economic 
sense. In reality therefore the covariance between Y t and u t is expected to be positive. 
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where the lowercase letters, as usual, indicate deviations from the (sample) mean values. 
Substituting for C t from Eq. (18.2.3), we obtain 


Pi 


E(fl> + ft Y t - 


Et? 


( 18 . 3 . 7 ) 


where in the last step use is made of the fact that ^_Vi = 0 and (X! Y t y t /J2 yf) = 1 
(why?). 

If we take the expectation of Eq. (18.3.7) on both sides, we obtain 


E 0i) = Pi + E ^j-\ ( 18 . 3 . 8 ) 


Unfortunately, we cannot evaluate £(E Yt u t/ E >f) since the expectations operator is a lin¬ 
ear operator. [Note: E(A/B) ^ E(A)/E(B).] But intuitively it should be clear that unless 
the term (Ejw/Et 2 ) is zero, Pi ' s a biased estimator of . But have we not shown in 
Eq. (18.3.5) that the covariance between Fandwis nonzero and therefore would p\ not be bi¬ 
ased? The answer is, not quite, since cov (Y,, u,), a population concept, is not quite X! Vt u t, 
which is a sample measure, although as the sample size increases indefinitely the latter will 
tend toward the former. But if the sample size increases indefinitely, then we can resort to the 
concept of consistent estimator and find out what happens to fit as n, the sample size, 
increases indefinitely. In short, when we cannot explicitly evaluate the expected value of an 
estimator, as in Eq. (18.3.8), we can turn our attention to its behavior in the large sample. 

Now an estimator is said to be consistent if its probability limit, 7 or plim for short, is 
equal to its true (population) value. Therefore, to show that p\ of Eq. (18.3.7) is inconsis¬ 
tent, we must show that its plim is not equal to the true fi \. Applying the rules of probability 
limit to Eq. (18.3.7), we obtain: 8 

plim(/3i) = plim(^i) + plim 


E?w\ 
Et , 2 ) 


= plim (/Si) + plim 


plim {J2y>u,/n) 

1 Pi™ (E.v 2 /«) 


( 18 . 3 . 9 ) 


where in the second step we have divided J2 y t u t and J2 yf by the total number of obser¬ 
vations in the sample n so that the quantities in the parentheses are now the sample covari¬ 
ance between 7 and u and the sample variance of Y, respectively. 

In words, Eq. (18.3.9) states that the probability limit of/Si is equal to true /Si plus the ratio 
of the plim of the sample covariance between Y and u to the plim of the sample variance of Y. 
Now as the sample size n increases indefinitely, one would expect the sample covariance be¬ 
tween 7 and u to approximate the true population covariance E[Y, - E(Y t )][u, - E(u t )], 
which from Eq. (18.3.5) is equal to [cr 2 /(l — /Si)]. Similarly, as n tends to infinity, the sample 


7 See Appendix A for the definition of probability limit. 

8 As stated in Appendix A, the plim of a constant (for example, /8i) is the same constant and the 
plim of (A/fi) = plim (A)/plim (fi). Note, however, that E(A/B) + E(A)/E(B). 
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variance of Y will approximate its population variance, say a\. Therefore, Eq. (18.3.9) may 
be written as 

plim(A) = Pi + ° ^ 2 ^ 

( 18 . 3 . 10 ) 

Given thatO < < 1 and that a 2 and Oy are both positive, it is obvious from Eq. (18.3.10) 

that plim (j}\) will always be greater than /h; that is, p\ will overestimate the true fi\ , 9 In 
other words, fi\ is a biased estimator, and the bias will not disappear no matter how large 
the sample size. 

18.4 The Simultaneous-Equation Bias: A Numerical Example 


To demonstrate some of the points made in the preceding section, let us return to the sim¬ 
ple Keynesian model of income determination given in Example 18.2 and carry out the fol¬ 
lowing Monte Carlo study. 10 Assume that the values of investment I are as shown in 
column 3 of Table 18.1. Further assume that 


E(u t ) = 0 

E(u t u t+j ) = 0 (J ± 0) 
var(tt r ) = er 2 = 0.04 
co v(u t , I t ) = 0 


The u, thus generated are shown in column 4. 

For the consumption function (18.2.3) assume that the values of the true parameters are 
known and are Po = 2 and fi\ = 0.8. 

From the assumed values of fio and fi\ and the generated values of u, we can generate 
the values of income Y t from Eq. (18.3.1), which are shown in column 1 of Table 18.1. 
Once Y t are known, and knowing fo, P\, and u t , one can easily generate the values of con¬ 
sumption C t from Eq. (18.2.3). The C’s thus generated are given in column 2. 

Since the true /3 0 and fi\ are known, and since our sample errors are exactly the same as 
the “true” errors (because of the way we designed the Monte Carlo study), if we use the 
data of Table 18.1 to regress C, on Y t we should obtain (in = 2 and (i\ — 0.8, if OLS were 
unbiased. But from Eq. (18.3.7) we know that this will not he the case if the regressor Y, 
and the disturbance u, are correlated. Now it is not too difficult to verify from our data that 
the (sample) covariance between Y, and u, is J2 y t u t =3.8 and that J2 yf = 184. Then, as 
Eq. (18.3.7) shows, we should have 


J2yt 


( 18 . 4 . 1 ) 


= 0.82065 

That is, P\ is upward-biased by 0.02065. 


9 ln general, however, the direction of the bias will depend on the structure of the particular model 
and the true values of the regression coefficients. 

10 This is borrowed from Kenneth J. White, Nancy C. Horsman, and Justin B. Wyatt, SHAZAM: Computer 
Handbook for Econometrics for Use with Basic Econometrics, McGraw-Hill, New York, 1985, pp. 131-1 34. 
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TABLE 18.1 


F t 

C t 

l t 

Ut 

(1) 

(2) 

(3) 

(4) 

18.15697 

16.15697 

2.0 

-0.3686055 

19.59980 

1 7.59980 

2.0 

—0.8004084E-01 

21.93468 

19.73468 

2.2 

0.1869357 

21.55145 

19.35145 

2.2 

0.1102906 

21.88427 

19.48427 

2.4 

-0.2314535E-01 

22.42648 

20.02648 

2.4 

0.8529544E-01 

25.40940 

22.80940 

2.6 

0.4818807 

22.69523 

20.09523 

2.6 

-0.6095481 E-01 

24.36465 

21.56465 

2.8 

0.7292983E-01 

24.39334 

21.59334 

2.8 

0.7866819E-01 

24.09215 

21.09215 

3.0 

-0.1815703 

24.87450 

21.87450 

3.0 

—0.2509900E-01 

25.31580 

22.11580 

3.2 

-0.1368398 

26.30465 

23.10465 

3.2 

0.6092946E-01 

25.78235 

22.38235 

3.4 

-0.2435298 

26.08018 

22.68018 

3.4 

-0.1839638 

27.24440 

23.64440 

3.6 

-0.1511200 

28.00963 

24.40963 

3.6 

0.1926739E-02 

30.89301 

27.09301 

3.8 

0.3786015 

28.98706 

25.18706 

3.8 

—0.2588852E-02 


Source: Kenneth J. White, Nancy G, Horsman, and Justin B. Wyatt, SHAZAM: Computer Handbookfor Econometrics for Use 
with Damodar Gujarati: Basic Econometrics, September 1985, p. 132. 


Now let us regress C t on Y t , using the data given in Table 18.1. The regression results 
are 

C t = 1.4940 + 0.82065 F; 

se = (0.35413) (0.01434) (18.4.2) 

t = (4.2188) (57.209) R 2 = 0.9945 

As expected, the estimated /h is precisely the one predicted by Eq. (18.4.1). In passing, 
note that the estimated too is biased. 

In general, the amount of the bias in depends on fi \, a 2 and var (F) and, in particular, 
on the degree of covariance between F and u. u As Kenneth White et al. note, “This is what 
simultaneous equation bias is all about. In contrast to single equation models, we can no 
longer assume that variables on the right hand side of the equation are uncorrelated with the 
error term.” 12 Bear in mind that this bias remains even in large samples. 

In view of the potentially serious consequences of applying OLS in simultaneous- 
equation models, is there a test of simultaneity that can tell us whether in a given instance 
we have the simultaneity problem? One version of the Hausman specification test can be 
used for this purpose, which we discuss in Chapter 19. 


"See Eq. (18.3.5). 

12 Op. cit., pp. 133-134. 
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Summary and 
Conclusions 


EXERCISES 


1. In contrast to single-equation models, in simultaneous-equation models more than one 
dependent, or endogenous, variable is involved, necessitating as many equations as the 
number of endogenous variables. 

2. A unique feature of simultaneous-equation models is that the endogenous variable (i.e., 
regressand) in one equation may appear as an explanatory variable (i.e., regressor) in an¬ 
other equation of the system. 

3. As a consequence, such an endogenous explanatory variable becomes stochastic and 
is usually correlated with the disturbance term of the equation in which it appears as an 
explanatory variable. 

4. In this situation the classical OLS method may not be applied because the estimators 
thus obtained are not consistent, that is, they do not converge to their true population val¬ 
ues no matter how large the sample size. 

5. The Monte Carlo example presented in the text shows the nature of the bias involved in 
applying OLS to estimate the parameters of a regression equation in which the regres¬ 
sor is correlated with the disturbance term, which is typically the case in simultaneous- 
equation models. 

6. Since simultaneous-equation models are used frequently, especially in econometric 
models, alternative estimating techniques have been developed by various authors. 
These are discussed in Chapter 20, after the topic of the identification problem is con¬ 
sidered in Chapter 19, a topic logically prior to estimation. 


Questions 

18.1. Develop a simultaneous-equation model for the supply of and demand for dentists 
in the United States. Specify the endogenous and exogenous variables in the model. 

18.2. Develop a simple model of the demand for and supply of money in the United 
States and compare your model with those developed by K. Brunner and A. H. 
Meltzer* and R. Tiegen. 1 ' 

18.3. a. For the demand-and-supply model of Example 18.1, obtain the expression for 

the probability limit of d|. 

b. Under what conditions will this probability limit be equal to the true ot\l 

18.4. For the IS-LM model discussed in the text, find the level of interest rate and income 
that is simultaneously compatible with the goods and money market equilibrium. 

18.5. To study the relationship between inflation and yield on common stock, Bruno 
Oudetl used the following model: 

Rbt — ui + ot 2 R s t + ctsRbt-x + a 4 L t + a 5 Y t + a 6 NIS t + a-jl, + u u 
R st = Pi + @2 Rbt + foRbt- 1 + /?4 L t + /?5 Y t + ySgNISf + /J7 E t + U2t 


*"Some Further Evidence on Supply and Demand Functions for Money," journal of Finance, vol. 19, 
May 1964, pp. 240-283. 

t"Demand and Supply Functions for Money in the United States," Econometrica, vol. 32, no. 4, 
October 1964, pp. 476-509. 

tBruno A. Oudet, "The Variation of the Return on Stocks in Periods of Inflation," journal of Financial 
and Quantitative Analysis, vol. 8, no. 2, March 1973, pp. 247-258. 
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where L = real per capita monetary base 
Y = real per capita income 
/ = the expected rate of inflation 
NIS = a new issue variable 

E = expected end-of-period stock returns, proxied by lagged stock price ratios 
R bt = bond yield 
R st = common stock returns 

a. Offer a theoretical justification for this model and see if your reasoning agrees 
with that of Oudet. 

b. Which are the endogenous variables in the model? Which are the exogenous 
variables? 

c. How would you treat the lagged R bt —endogenous or exogenous? 

18.6. In their article, “A Model of the Distribution of Branded Personal Products in 
Jamaica,”* John U. Farley and Harold J. Levitt developed the following model (the 
personal products considered were shaving cream, skin cream, sanitary napkins, 
and toothpaste): 

Yu — a\ + fix Y 2i + fi 2 Y 3i + fizYm + u\ t 

fit = «2 + fi^Yu + fisYsi + Y\X\i + yiXx + u 2 i 

Yzi = a3 + fi b Y 2i + YiXu + u 3i 

Kg = a 4 + p 2 Y 2 i + 3/4X4, + U41 

Y 5 i = 015 + fi'i Y 2l + p 9 Y 3i + fro Y 4i + u$, 

where Y\ = percent of stores stocking the product 
Y 2 = sales in units per month 

Y 3 = index of direct contact with importer and manufacturer for the product 
y 4 = index of wholesale activity in the area 

Y s = index of depth of brand stocking for the product (i.e., average number of 
brands of the product stocked by stores carrying the product) 

X\ = target population for the product 

X 2 = income per capita in the parish where the area is 

X 3 = distance from the population center of gravity to Kingston 

X4 = distance from population center to nearest wholesale town 

a. Can you identify the endogenous and exogenous variables in the preceding 
model? 

b. Can one or more equations in the model be estimated by the method of least 
squares? Why or why not? 

18.7. To study the relationship between advertising expenditure and sales of cigarettes, 
Frank Bass used the following model: ' 

Y\ t — ot\+ Y 3t + fi 2 Y 4t + 3/1X1, + 3/2X2, + «i, 

Y 2t =0:2 + fi 3 Y 3t + /J4J4, + 3/3X1, + 3/4X2, + u 2 t 
Y 3t = a 3 + fisYxi + fieY 2t + u 3t 
Y 4t — a 4 + fi-jYx, + fisY 2t + u 4t 


'journal of Marketing Research, November 1968, pp. 362-368. 

t"A Simultaneous Equation Regression Study of Advertising and Sales of Cigarettes," journal of Mar¬ 
keting Research, vol. 6, August 1969, pp. 291-300. 
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where Y\ = logarithm of sales of filter cigarettes (number of cigarettes) divided by 
population over age 20 

Y 2 = logarithm of sales of nonfilter cigarettes (number of cigarettes) divided by 
population over age 20 

73 = logarithm of advertising dollars for filter cigarettes divided by population 
over age 20 divided by advertising price index 
I4 = logarithm of advertising dollars for nonfilter cigarettes divided by popula¬ 
tion over age 20 divided by advertising price index 
X\ = logarithm of disposable personal income divided by population over age 20 
divided by consumer price index 

X 2 = logarithm of price per package of nonfilter cigarettes divided by consumer 
price index 

a. In the preceding model the F’s are endogenous and the X’s are exogenous. Why 
does the author assume X 2 to be exogenous? 

b. If X 2 is treated as an endogenous variable, how would you modify the preceding 
model? 

18.8. G. Menges developed the following econometric model for the West German 
economy:* 

Y t — P 0 + P\Y t -\ + Pi ft + U\ t 
ft = fh + P*Y t + Ps Q t + u 2t 
Cf = A> + PlY t + P«C t -i + PgP t + u 3t 
Qt = P10 + Pu Qt-\ + PuRt + u*t 

where Y = national income 

I = net capital formation 
C = personal consumption 
Q = profits 

P = cost of living index 
R = industrial productivity 
t = time 

u = stochastic disturbances 

a. Which of the variables would you regard as endogenous and which as exogenous? 

b. Is there any equation in the system that can be estimated by the single-equation 
least-squares method? 

c. What is the reason behind including the variable P in the consumption function? 

18.9. L. E. Gallaway and P. E. Smith developed a simple model for the United States 
economy, which is as follows: 1 ' 


Y t = C, +I t + G t 

C t =p 1 +p 2 YD t _ l +p 3 M t +u u 

I t = P\ + Ps(Y t _ 1 — Y ,_ 2 ) + Pe^t-i + U2t 
G t — Pi + P$Gt—\ + u 3t 


*G. Menges, "Ein Okonometriches Modell der Bundesrepublik Deutschland (Vier Strukturgleichungen)," 
I.F.O. Studien, vol. 5, 1959, pp. 1-22. 

+ "A Quarterly Econometric Model of the United States," journal of American Statistical Association, 
vol. 56, 1961, pp. 379-385. 



Chapter 18 Simultaneous-Equation Models 687 

where Y = gross national product 

C = personal consumption expenditure 
/ = gross private domestic investment 
G = government expenditure plus net foreign investment 
YD = disposable, or after-tax, income 
M = money supply at the beginning of the quarter 
Z = property income before taxes 
t = time 

Mi, M2, and M3 = stochastic disturbances 

All variables are measured in the first-difference form. 

From the quarterly data from 1948-1957, the authors applied the least-squares 
method to each equation individually and obtained the following results: 

C t = 0.09 + 0.43YD,_i + 0.23M, R 2 = 0.23 

I t = 0.08 + 0.43(Y,_i - Y t _ 2 ) + 0.48Z, R 2 = 0.40 

G t — 0.13 + 0.67G;_i R 2 — 0.42 

a. How would you justify the use of the single-equation least-squares method in 

this case? 

b. Why are the R 2 values rather low? 

Empirical Exercises 

18.10. Table 18.2 gives you data on Y (gross domestic product), I (gross private domestic 
investment), and C (personal consumption expenditure) for the United States for the 
period 1970-2006. All data are in 1996 billions of dollars. Assume that C is linearly 
related to Y as in the simple Keynesian model of income determination of Exam¬ 
ple 18.2. Obtain OLS estimates of the parameters of the consumption function. Save the 
results for another look at the same data using the methods developed in Chapter 20. 

18.11. Using the data given in Exercise 18.10, regress gross domestic investment / on 
GDP and save the results for further examination in a later chapter. 

18.12. Consider the macroeconomics identity 

C+I = Y (= GDP) 

As before, assume that 

Ct~Po + PiY t + u t 

and, following the accelerator model of macroeconomics, let 

I,=a 0 + ot\{Y t - })_!) + V/ 

where u and v are error terms. From the data given in Exercise 18.10, estimate the 
accelerator model and save the results for further study. 

18.13. Supply and demand for gas. Table 18.3, found on the textbook website, gives data 
on some of the variables that determine demand for and supply of gasoline in the 
U.S. from January 1978 to August 2002.* The variables are: pricegas (cents per 


These data are taken from the website of Stephen J. Schmidt, Econometrics, McGraw-Hill, New York, 
2005. See www.mhhe.com/economics. 
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TABLE 18.2 Personal Consumption Expenditure, Gross Private Domestic Investment, and GDP, United States, 
1970-2006 (billions of 1996 dollars) 


Observation 

C 

/ 

Y 

Observation 

C 

/ 

Y 

1970 

2,451.9 

427.1 

3,771.9 

1989 

4,675.0 

926.2 

6,981.4 

1971 

2,545.5 

475.7 

3,898.6 

1990 

4,770.3 

895.1 

7,112.5 

1972 

2,701.3 

532.1 

4,105.0 

1991 

4,778.4 

822.2 

7,100.5 

1973 

2,833.8 

594.4 

4,341.5 

1992 

4,934.8 

889.0 

7,336.6 

1974 

2,812.3 

550.6 

4,319.6 

1993 

5,099.8 

968.3 

7,532.7 

1975 

2,876.9 

453.1 

4,311.2 

1994 

5,290.7 

1,099.6 

7,835.5 

1976 

3,035.5 

544.7 

4,540.9 

1995 

5,433.5 

1,134.0 

8,031.7 

1977 

3,164.1 

627.0 

4,750.5 

1996 

5,619.4 

1,234.3 

8,328.9 

1978 

3,303.1 

702.6 

5,015.0 

1997 

5,831.8 

1,387.7 

8,703.5 

1979 

3,383.4 

725.0 

5,173.4 

1998 

6,125.8 

1,524.1 

9,066.9 

1980 

3,374.1 

645.3 

5,161.7 

1999 

6,438.6 

1,642.6 

9,470.3 

1981 

3,422.2 

704.9 

5,291.7 

2000 

6,739.4 

1,735.5 

9,81 7.0 

1982 

3,470.3 

606.0 

5,189.3 

2001 

6,910.4 

1,598.4 

9,890.7 

1983 

3,668.6 

662.5 

5,423.8 

2002 

7,099.3 

1,557.1 

10,048.8 

1984 

3,863.3 

857.7 

5,813.6 

2003 

7,295.3 

1,613.1 

10,301.0 

1985 

4,064.0 

849.7 

6,053.7 

2004 

7,561.4 

1,770.2 

10,675.8 

1986 

4,228.9 

843.9 

6,263.6 

2005 

7,803.6 

1,869.3 

11,003.4 

1987 

4,369.8 

870.0 

6,475.1 

2006 

8,044.1 

1,919.5 

11,319.4 

1988 

4,546.9 

890.5 

6,742.7 






Notes: C = personal consumption expenditure. 

/ = gross private domestic investment. 

Y = gross domestic product. 

Source: Economic Report of the President, 2008, Table B-2. 


gallon); quantgas (thousands of barrels per day, unleaded); persincome (personal 
income, billions of dollars); and car sales (millions of cars per year). 

a. Develop a suitable supply-and-demand model for gasoline consumption. 

b. Which variables in the model in (a) are endogenous and which are exogenous? 

c. If you estimate the demand-and-supply functions that you have developed by 
OLS, will your results be reliable? Why or why not? 

d. Save the OLS estimates of your demand-and-supply functions for another look 
after we discuss Chapter 20. 

18.14. Table 18.4, found on the textbook website, gives macroeconomic data on several 
variables for the U.S. economy for the quarterly periods 1951-1 to 2000-IV* The 
variables are as follows: Year = date; Qtr = quarter; Realgdp = real GDP (billions 
of dollars); Realcons = real consumption expenditure; Realinvs = real investment 
by private sector; Realgovt = real government expenditure; Realdpi = real dispos¬ 
able personal income; CPIJU = consumer price index; Ml = nominal money 
stock; Tbilrate — quarterly average of month-end 90-day T-bill rate; Pop = 
population, millions, interpolate of year-end figures using constant growth rate per 
quarter; Infl = rate of inflation (first observation is missing); and Realint = expost 
real interest rate = Tbilrate-Infl (first observation missing). 

Using these data, develop a simple macroeconomic model of the U.S. economy. 
You will be asked to estimate this model in Chapter 20. 

‘These data are originally from the Department of Commerce, Bureau of Economic Analysis, and from 
www.economagic.com, and are reproduced from William H. Greene, Econometric Analysis, 6th ed., 
2008, Table F5.1, p.1083. 





Chapter 


The Identification 
Problem 


In this chapter we consider the nature and significance of the identification problem. The 
crux of the identification problem is as follows: Recall the demand-and-supply model 
introduced in Section 18.2. Suppose that we have time series data on Q and P only and no 
additional information (such as income of the consumer, price prevailing in the previous 
period, and weather condition). The identification problem then consists in seeking an 
answer to this question: Given only the data on P and Q, how do we know whether we are 
estimating the demand function or the supply function? Alternatively, if we think we are 
fitting a demand function, how do we guarantee that it is, in fact, the demand function that 
we are estimating and not something else? 

A moment’s reflection will reveal that an answer to the preceding question is necessary 
before one proceeds to estimate the parameters of our demand function. In this chapter we 
shall show how the identification problem is resolved. We first introduce a few notations 
and definitions and then illustrate the identification problem with several examples. This is 
followed by the rules that may be used to find out whether an equation in a simultaneous- 
equation model is identified, that is, whether it is the relationship that we are actually esti¬ 
mating, be it the demand or supply function or something else. 

19.1 Notations and Definitions 

To facilitate our discussion, we introduce the following notations and definitions. 

The general M equations model in M endogenous, or jointly dependent, variables may 
be written as Eq. (19.1.1): 

Y»= PnT2t+toT3t+-” + PiuY M , 

+ YuXu + YnX2t + • • ■ + YiK^Kt + u\ t 
Y 2t = P 21 Yu + 023 Tit + • • ■ + PiuYMt 

+ Yi\X\ t + YnXit H-+ YiK^Kt + U2t 

Y 3 , = Ai Y U + p 32 Y 2t + • • • + p 3M Y Mt 

+ YiiXu + y 3 iX2t -\ -+ YstcXict + u 3t 


YmT = Pm\ Y\t + PM2Y2t H-+ Pm,M-\Ym-\J 

+ YM\X\t + YM2X 2t -\ - 1 - YmkXki + UMt 

( 19 . 1 . 1 ) 

689 
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where Y\, Y 2 ,..., Y M = M endogenous, or jointly dependent, variables 

X\, X 2 , ...,X k =K predetermined variables (one of these X variables may take a 
value of unity to allow for the intercept term in each equation) 
Mi, u 2 , ...,um = M stochastic disturbances 
t — 1,2,..., f = total number of observations 

yd’s = coefficients of the endogenous variables 
y’s = coefficients of the predetermined variables 

In passing, note that not each and every variable need appear in each equation. As a matter 
of fact, we see in Section 19.2 that this must not be the case if an equation is to be identified. 

As Eq. (19.1.1) shows, the variables entering a simultaneous-equation model are of two 
types: endogenous, that is, those (whose values are) determined within the model; and 
predetermined, that is, those (whose values are) determined outside the model. The endoge¬ 
nous variables are regarded as stochastic, whereas the predetermined variables are treated 
as nonstochastic. 

The predetermined variables are divided into two categories: exogenous, current as well 
as lagged, and lagged endogenous. Thus, X\ t is a current (present-time) exogenous variable, 
whereas is a lagged exogenous variable, with a lag of one time period. Tp-i) is a 

lagged endogenous variable with a lag of one time period, but since the value of Tiq-i) is 
known at the current time t, it is regarded as nonstochastic, hence, a predetermined variable. 1 
In short, current exogenous, lagged exogenous, and lagged endogenous variables are deemed 
predetermined; their values are not determined by the model in the current time period. 

It is up to the model builder to specify which variables are endogenous and which are 
predetermined. Although (noneconomic) variables, such as temperature and rainfall, are 
clearly exogenous or predetermined, the model builder must exercise great care in classify¬ 
ing economic variables as endogenous or predetermined: He or she must defend the classi¬ 
fication on a priori or theoretical grounds. However, later in the chapter we provide a 
statistical test of exogeneity. 

The equations appearing in (19.1.1) are known as the structural, or behavioral, equa¬ 
tions because they may portray the structure (of an economic model) of an economy or the 
behavior of an economic agent (e.g., consumer or producer). The yd’s and y’s are known as 

the structural parameters or coefficients. 

From the structural equations one can solve for the M endogenous variables and derive 

the reduced-form equations and the associated reduced-form coefficients. A reduced- 
form equation is one that expresses an endogenous variable solely in terms of the 
predetermined variables and the stochastic disturbances. To illustrate, consider the 
Keynesian model of income determination encountered in Chapter 18: 

Consumption function: C t = + f5\Y t + u t 0 < fi\ < 1 ( 18 . 2 . 3 ) 

Income identity: Y t = C t + It ( 18 . 2 . 4 ) 

In this model C (consumption) and Y (income) are the endogenous variables and / (investment 
expenditure) is treated as an exogenous variable. Both these equations are structural equations, 
Eq. (18.2.4) being an identity. As usual, the MPC P\ is assumed to he between 0 and 1. 

If Eq. (18.2.3) is substituted into Eq. (18.2.4), we obtain, after simple algebraic 
manipulation, 

Y t = Y\ 0 + n x I t +w t ( 19 . 1 . 2 ) 


1 1t is assumed implicitly here that the stochastic disturbances, the u's, are serially uncorrelated. If this 
is not the case, VV-i will be correlated with the current period disturbance term u t . Hence, we cannot 
treat it as predetermined. 
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where 


n 0 


Po 


1 — Pi 


1 -Pi 


(19.1.3) 


w t = 


1 ~Pi 


Equation (19.1.2) is a reduced-form equation; it expresses the endogenous variable Y 
solely as a function of the exogenous (or predetermined) variable / and the stochastic distur¬ 
bance term u. Flo and Eli are the associated reduced-form coefficients. Notice that these 
reduced-form coefficients are nonlinear combinations of the structural coefficient(s). 

Substituting the value of 7from Eq. (19.1.2) into C of Eq. (18.2.3), we obtain another 
reduced-form equation: 

c, = n 2 + n 3 /, + w t (19.1.4) 

where 


n 2 


Po 

1 -Pi 


(19.1.5) 


The reduced-form coefficients, such as Eli and n 3 , are also known as impact, or short- 
run, multipliers, because they measure the immediate impact on the endogenous variable 
of a unit change in the value of the exogenous variable. 2 If in the preceding Keynesian 
model the investment expenditure is increased by, say, $1 and if the MPC is assumed to be 
0.8, then from Eq. (19.1.3) we obtain n i = 5. This result means that increasing the invest¬ 
ment by $1 will immediately (i.e., in the current time period) lead to an increase in income 
of $5, that is, a fivefold increase. Similarly, under the assumed conditions, Eq. (19.1.5) 
shows that n 3 = 4, meaning that $1 increase in investment expenditure will lead immedi¬ 
ately to $4 increase in consumption expenditure. 

In the context of econometric models, equations such as Eq. (18.2.4) or Q d t — Q s t 
(quantity demanded equal to quantity supplied) are known as the equilibrium conditions. 
Identity (18.2.4) states that aggregate income Y must be equal to aggregate consumption 
(i.e., consumption expenditure plus investment expenditure). When equilibrium is 
achieved, the endogenous variables assume their equilibrium values. 3 

Notice an interesting feature of the reduced-form equations. Since only the predeter¬ 
mined variables and stochastic disturbances appear on the right sides of these equations, 
and since the predetermined variables are assumed to be uncorrelated with the disturbance 
terms, the OLS method can be applied to estimate the coefficients of the reduced-form 
equations (the El’s). From the estimated reduced-form coefficients one may estimate the 
structural coefficients (the /3’s), as shown later. This procedure is known as indirect least 
squares (ILS), and the estimated structural coefficients are called ILS estimates. 


2 ln econometric models the exogenous variables play a crucial role. Very often, such variables are 
under the direct control of the government. Examples are the rate of personal and corporate taxes, 
subsidies, unemployment compensation, etc. 

3 For details, see Jan Kmenta, Elements of Econometrics, 2d ed., Macmillan, New York, 1986, pp. 723-731. 
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We shall study the ILS method in greater detail in Chapter 20. In the meantime, note that 
since the reduced-form coefficients can be estimated by the OLS method, and since these co¬ 
efficients are combinations of the structural coefficients, the possibility exists that the 
structural coefficients can be “retrieved” from the reduced-form coefficients, and it is in 
the estimation of the structural parameters that we may be ultimately interested. How does 
one retrieve the structural coefficients from the reduced-form coefficients? The answer is 
given in Section 19.2, an answer that brings out the crux of the identification problem. 

19.2 The Identification Problem 


By the identification problem we mean whether numerical estimates of the parameters of 
a structural equation can be obtained from the estimated reduced-form coefficients. If this 
can be done, we say that the particular equation is identified. If this cannot be done, then we 
say that the equation under consideration is unidentified, or underidentified. 

An identified equation may be either exactly (or fully or just) identified or overidentified. 
It is said to be exactly identified if unique numerical values of the structural parameters can 
be obtained. It is said to be overidentified if more than one numerical value can be obtained 
for some of the parameters of the structural equations. The circumstances under which each 
of these cases occurs will be shown in the following discussion. 

The identification problem arises because different sets of structural coefficients may be 
compatible with the same set of data. To put the matter differently, a given reduced-form 
equation may be compatible with different structural equations or different hypotheses 
(models), and it may be difficult to tell which particular hypothesis (model) we are investi¬ 
gating. In the remainder of this section we consider several examples to show the nature of 
the identification problem. 

Underidentification 

Consider once again the demand-and-supply model (18.2.1) and (18.2.2), together with the 
market-clearing, or equilibrium, condition that demand is equal to supply. By the equilib¬ 
rium condition, we obtain 


do + d\Pt + U\t = Po + P\P t + U2t 

(19.2.1) 

Solving Eq. (19.2.1), we obtain the equilibrium price 


p t = n 0 + v, 

(19.2.2) 

where 


n Po ~ d 0 
d\ - Pi 

(19.2.3) 

U2t -Ult 
v ‘~ di-pi 

(19.2.4) 

Substituting P, from Eq. (19.2.2) into Eq. (18.2.1) or (18.2.2), 
equilibrium quantity: 

we obtain the following 

Qt = fli + W; 

(19.2.5) 

where 

n l = “A-“»A 

d\ ~ Pi 

(19.2.6) 

dlU 2t ~ PlUlt 

W> ~ di- Pi 

(19.2.7) 


where 
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Incidentally, note that the error terms v t and w t are linear combinations of the original error 
terms u\ and M2 - 

Equations (19.2.2) and (19.2.5) are reduced-form equations. Now our demand-and- 
supply model contains four structural coefficients a 0 , a\, Po, and P \, but there is no unique 
way of estimating them. Why? The answer lies in the two reduced-form coefficients given in 
Eqs. (19.2.3) and (19.2.6). These reduced-form coefficients contain all four structural para¬ 
meters, hut there is no way in which the four structural unknowns can be estimated from only 
two reduced-form coefficients. Recall from high school algebra that to estimate four un¬ 
knowns we must have four (independent) equations, and, in general, to estimate & unknowns 
we must have k (independent) equations. Incidentally, if we run the reduced-form regression 
(19.2.2) and (19.2.5), we will see that there are no explanatory variables, only the constants, 
and these constants will simply give the mean values of P and Q (why?). 

What all this means is that, given time series data on P (price) and Q (quantity) and no 
other information, there is no way the researcher can guarantee whether he or she is esti¬ 
mating the demand function or the supply function. That is, a given P t and Q t represent 
simply the point of intersection of the appropriate demand-and-supply curves because of 
the equilibrium condition that demand is equal to supply. To see this clearly, consider the 
scattergram shown in Figure 19.1. 

Figure 19.1a gives a few scatterpoints relating Q to P. Each scatterpoint represents the 
intersection of a demand and a supply curve, as shown in Figure 19.1 b. Now consider a sin¬ 
gle point, such as that shown in Figure 19.1c. There is no way we can be sure which demand- 
and-supply curve of a whole family of curves shown in that panel generated that point. 
Clearly, some additional information about the nature of the demand-and-supply curves is 
needed. For example, if the demand curve shifts over time because of change in income, 













694 Part Four Simultaneous-Equation Models and Time Series Econometrics 


tastes, etc., but the supply curve remains relatively stable, as in Figure 19.1 d, the scatter- 
points trace out a supply curve. In this situation, we say that the supply curve is identified. 
By the same token, if the supply curve shifts over time because of changes in weather con¬ 
ditions (in the case of agricultural commodities) or other extraneous factors but the demand 
curve remains relatively stable, as in Figure 19. le, the scatterpoints trace out a demand 
curve. In this case, we say that the demand curve is identified. 

There is an alternative and perhaps more illuminating way of looking at the identifica¬ 
tion problem. Suppose we multiply Eq. (18.2.1)by X (0 < X < 1) andEq. (18.2.2)by 1 - X 
to obtain the following equations (note: we drop the superscripts on Q): 

XQ 1 = Xa 0 + XaiP t + Xui, ( 19 . 2 . 8 ) 

(1 - X)Q, = (1 - X)p 0 + (1 - X)faP t + (1 - X)u 2 , ( 19 . 2 . 9 ) 


Adding these two equations gives the following linear combination of the original demand- 
and-supply equations: 


Qt = y o + Yi p t + w, 


( 19 . 2 . 10 ) 


where 


Yo = Xao + (1 — X)p 0 

Yi=X ai +(l-X)pi ( 19 . 2 . 11 ) 

w t = Xu\ t + (1 -X)u 2t 


The “bogus,” or “mongrel,” equation (19.2.10) is observationally indistinguishable 
from either Eq. (18.2.1) or Eq. (18.2.2) because they involve the regression of Q and P. 
Therefore, if we have time series data on P and Q only, any of Eqs. (18.2.1), (18.2.2), or 
(19.2.10) may be compatible with the same data. In other words, the same data may be 
compatible with the “hypothesis” Eqs. (18.2.1), (18.2.2), or (19.2.10), and there is no way 
we can tell which one of these hypotheses we are testing. 

For an equation to be identified, that is, for its parameters to be estimated, it must be shown 
that the given set of data will not produce a structural equation that looks similar in appearance 
to the one in which we are interested. If we set out to estimate the demand function, we must 
show that the given data are not consistent with the supply function or some mongrel equation. 


Just, or Exact, Identification 

The reason we could not identify the preceding demand function or the supply function was 
that the same variables P and Q are present in both functions and there is no additional in¬ 
formation, such as that indicated in Figure 19. Id or e. But suppose we consider the follow¬ 
ing demand-and-supply model: 

Demand function: Q t = a 0 + «i P, + a 2 I t + u\ t oq < 0, a 2 > 0 ( 19 . 2 . 12 ) 
Supply function: Qt = Po +P\Pt + uit Pi > 0 ( 19 . 2 . 13 ) 

where / = income of the consumer, an exogenous variable, and all other variables are as 
defined previously. 

Notice that the only difference between the preceding model and our original demand- 
and-supply model is that there is an additional variable in the demand function, namely, in¬ 
come. From economic theory of demand we know that income is usually an important 
determinant of demand for most goods and services. Therefore, its inclusion in the demand 
function will give us some additional information about consumer behavior. For most com¬ 
modities income is expected to have a positive effect on consumption ( a 2 > 0). 
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Using the market-clearing mechanism, quantity demanded = quantity supplied, we have 
a 0 + ct\P t + a 2 I, + u\t = Po + PiPt + u lt (19.2.14) 
Solving Eq. (19.2.14) provides the following equilibrium value of P t : 

P t = n 0 + n 1 /, + v f (19.2.15) 

where the reduced-form coefficients are 


and 


n 0 

ni 


Po - «o 
«i - Pi 

Oil 

Oil-pi 


(19.2.16) 


v t = 


u 2t - Ult 

Oil - Pi 


Substituting the equilibrium value of P, into the preceding demand or supply function, we 
obtain the following equilibrium quantity: 


where 


and 


Qt = n 2 + Fiji, + w t 
aiPo - otoPi 


n 2 = - 


n 3 = - 


- Pi 


OilPl 

«i - Pi 


w, = 


OilU 2t — P\U\t 
Oil - Pi 


(19.2.17) 


(19.2.18) 


Since Eqs. (19.2.15) and (19.2.17) are both reduced-form equations, the ordinary least 
squares (OLS) method can be applied to estimate their parameters. Now the demand-and- 
supply model (19.2.12) and (19.2.13) contains five structural coefficients—ao, oq, a 2 , Po 
and Pi. But there are only four equations to estimate them, namely, the four reduced-form 
coefficients n 0 , Eli, n 2 , and n 3 given in Eqs. (19.2.16) and (19.2.18). Hence, unique so¬ 
lution of all the structural coefficients is not possible. But it can be readily shown that the 
parameters of the supply function can be identified (estimated) because 


Po = n 2 - A n o 
_ Th 
Pl Eh 


(19.2.19) 


But there is no unique way of estimating the parameters of the demand function; therefore, 
it remains underidentified. Incidentally, note that the structural coefficient Pi is a nonlinear 
function of the reduced-form coefficients, which poses some problems when it comes to es¬ 
timating the standard error of the estimated Pi , as we shall see in Chapter 20. 

To verify that the demand function (19.2.12) cannot be identified (estimated), let us mul¬ 
tiply it by X (0 < X < 1) and (19.2.13) by 1 — k and add them up to obtain the following 
“mongrel” equation: 


Qt = Yo + Yi p t + Yih + Wt 


( 19 . 2 . 20 ) 
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where 


and 


Yo = o + (1 - k)Po 

y 1 =Xai+(l -X)Pi (19.2.21) 

Y2 = 

W t = A. U\t + (1 - X)U2t 


Equation (19.2.20) is observationally indistinguishable from the demand function (19.2.12) 
although it is distinguishable from the supply function (19.2.13), which does not contain the 
variable / as an explanatory variable. Hence, the demand function remains unidentified. 

Notice an interesting fact: It is the presence of an additional variable in the demand 
function that enables us to identify the supply function! Why? The inclusion of the 
income variable in the demand equation provides us some additional information about the 
variability of the function, as indicated in Figure 19 Ad. The figure shows how the inter¬ 
section of the stable supply curve with the shifting demand curve (on account of changes in 
income) enables us to trace (identify) the supply curve. As will be shown shortly, very often 
the identifiability of an equation depends on whether it excludes one or more variables that 
are included in other equations in the model. 

But suppose we consider the following demand-and-supply model: 

Demand function: Q t = a o + ct\ P t + 0.2!t + u\ t a\ < 0,0*2 > 0 

(19.2.12) 

Supply function: Qt = Po + Pi P, + ^Pt -1 + u 2t Pi > 0, P2 > 0 

(19.2.22) 


where the demand function remains as before but the supply function includes an addi¬ 
tional explanatory variable, price lagged one period. The supply function postulates that the 
quantity of a commodity supplied depends on its current and previous period’s price, a 
model often used to explain the supply of many agricultural commodities. Note that P t _ 1 is 
a predetermined variable because its value is known at time t. 

By the market-clearing mechanism we have 

ao + ot\P t + a2f + u\ t — Po + Pt + fiiPt-\ + uit (19.2.23) 


Solving this equation, we obtain the following equilibrium price: 


where 


P, = no + D\I t + Yl 2 P,-\ + V/ 


n 0 

ni 


Po ~ dp 
d\ - Pi 
d2 

d\ ~ Pi 


Pi 

d\ ~ Pi 


U2t ~ Ul, 

dl ~ Pi 


(19.2.24) 


n 2 


(19.2.25) 
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Substituting the equilibrium price into the demand or supply equation, we obtain the 
corresponding equilibrium quantity: 

Qt = n i + n 4 i t + n 5 p t _ l +w t (19.2.26) 


where the reduced-form coefficients are 


aiPo-aoPi 

oix-Pi 


n 4 


OtlPl 
«1 - Pi 


n 5 = 


OllPl 


“1 - Pi 


(19.2.27) 


and 


_ «lU2< ~ PlUlt 
W ‘ on ~ Pi 

The demand-and-supply model given inEqs. (19.2.12) and (19.2.22) contains six structural 
coefficients— ckq, «i 5 or2, Po, P \, and P2 —and there are six reduced-form coefficients— 
n 0 , fli, n 2 , n 3 , n 4 , and n 5 — to estimate them. Thus, we have six equations in six un¬ 
knowns, and normally we should be able to obtain unique estimates. Therefore, the parameters 
of both the demand-and-supply equations can be identified, and the system as a whole can be 
identified. (In Exercise 19.2 the reader is asked to express the six structural coefficients in 
terms of the six reduced-form coefficients given previously to show that unique estimation of 
the model is possible.) 

To check that the preceding demand-and-supply functions are identified, we can also 
resort to the device of multiplying the demand equation (19.2.12) by X (0 < X < 1) and the 
supply equation (19.2.22) by 1 — a and add them to obtain a mongrel equation. This mon¬ 
grel equation will contain both the predetermined variables /, and P t -u hence, it will be 
observationally different from the demand as well as the supply equation because the former 
does not contain P,_\ and the latter does not contain I t . 

Overidentification 

For certain goods and services, income as well as wealth of the consumer is an important 
determinant of demand. Therefore, let us modify the demand function (19.2.12) as follows, 
keeping the supply function as before: 

Demand function: Q t = a 0 + oi\P t + a 2 lt + 0(3 R, + U\, (19.2.28) 

Supply function: Q t = p 0 + P t + p 2 P-\ + u 2t (19.2.22) 

where in addition to the variables already defined, R represents wealth; for most goods and 
services, wealth, like income, is expected to have a positive effect on consumption. 

Equating demand to supply, we obtain the following equilibrium price and quantity: 


p t — n 0 + n 1 i t + n 2 R, + ri 3 / > ( _i + vt 
Qt = n 4 + n 5 / ( + nA + n 2 p t _\ + wt 


(19.2.29) 

(19.2.30) 
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where 


n 0 

n 2 

n 4 

n 6 


w, 


Po — «0 
«1 - Pi 

«3 

ai - A 

aiPo - a 0 Pi 

«i - Pi 

(X3P1 

ai - Pi 
<XlU 2t - PlUu 

ai - Pi 


III 

n 3 


n 5 
n 7 


V t 


oil 

Qfi - Pi 

P2 

«1 - Pi 
OllPl 
Oil - Pi 
0llP2 
Oil ~ Pi 

U 2t — Ult 

Oil - Pi 


(19.2.31) 


The preceding demand-and-supply model contains seven structural coefficients, but 
there are eight equations to estimate them—the eight reduced-form coefficients given in 
Eq. (19.2.31); that is, the number of equations is greater than the number of unknowns. As 
a result, unique estimation of all the parameters of our model is not possible, which can be 
shown easily. From the preceding reduced-form coefficients, we can obtain 


n 6 

Th 

(19.2.32) 

n 5 

ni 

(19.2.33) 


that is, there are two estimates of the price coefficient in the supply function, and there is no 
guarantee that these two values or solutions will be identical. 4 Moreover, since P\ appears 
in the denominators of all the reduced-form coefficients, the ambiguity in the estimation of 
Pi will be transmitted to other estimates too. 

Why was the supply function identified in the system (19.2.12) and (19.2.22) but not in 
the system (19.2.28) and (19.2.22), although in both cases the supply function remains the 
same? The answer is that we have “too much,” or an oversufficiency of information, to 
identify the supply curve. This situation is the opposite of the case of underidentification, 
where there is too little information. The oversufficiency of the information results from the 
fact that in the model (19.2.12) and (19.2.22) the exclusion of the income variable from 
the supply function was enough to identify it, but in the model (19.2.28) and (19.2.22) the 
supply function excludes not only the income variable but also the wealth variable. In other 
words, in the latter model we put “too many” restrictions on the supply function by 
requiring it to exclude more variables than necessary to identify it. However, this situation 
does not imply that overidentification is necessarily bad because we shall see in Chapter 20 
how we can handle the problem of too much information, or too many restrictions. 

We have now exhausted all the cases. As the preceding discussion shows, an equation in 
a simultaneous-equation model may be underidentified or identified (either over- or just). 
The model as a whole is identified if each equation in it is identified. To secure identifica¬ 
tion, we resort to the reduced-form equations. But in Section 19.3, we consider an alterna¬ 
tive and perhaps less time-consuming method of determining whether or not an equation in 
a simultaneous-equation model is identified. 


4 Notice the difference between under- and overidentification. In the former case, it is impossible 
to obtain estimates of the structural parameters, whereas in the latter case, there may be several 
estimates of one or more structural coefficients. 
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19.3 Rules for Identification 



As the examples in Section 19.2 show, in principle it is possible to resort to the reduced- 
form equations to determine the identification of an equation in a system of simultaneous 
equations. But these examples also show how time-consuming and laborious the process 
can be. Fortunately, it is not essential to use this procedure. The so-called order and rank 
conditions of identification lighten the task by providing a systematic routine. 

To understand the order and rank conditions, we introduce the following notations: 

M — number of endogenous variables in the model 
m — number of endogenous variables in a given equation 

K — number of predetermined variables in the model including the intercept 
k — number of predetermined variables in a given equation 

The Order Condition of Identifiability 5 

A necessary (but not sufficient) condition of identification, known as the order condition, 
may be stated in two different but equivalent ways as follows (the necessary as well as suf¬ 
ficient condition of identification will be presented shortly): 

Definition 19.1 

In a model of M simultaneous equations, in order for an equation to be identified, it must 
exclude at least M - 1 variables (endogenous as well as predetermined) appearing in the 
model. If it excludes exactly M - 1 variables, the equation is just identified. If it excludes 
more than M - 1 variables, it is overidentified. 


Definition 19.2 

In a model of M simultaneous equations, in order for an equation to be identified, the 
number of predetermined variables excluded from the equation must not be less than the 
number of endogenous variables included in that equation less 1, that is, 

K-k>m -1 (19.3.1) 

If K - k = m — 1, the equation is just identified, but if K — k > m — 1, it is overidentified. 

In Exercise 19.1 the reader is asked to prove that the preceding two definitions of identifi¬ 
cation are equivalent. 

To illustrate the order condition, let us revert to our previous examples. 

EXAMPLE 19.1 

Demand function: Qf = ci 0 + at P t + U\ t (18.2.1) 

Supply function: Qf = Po + Pt P t + u 2t (18.2.2) 

This model has two endogenous variables P and Q and no predetermined variables. To be 
identified, each of these equations must exclude at least M - 1=1 variable. Since this is 
not the case, neither equation is identified. 


EXAMPLE 19.2 

Demand function: Qf = a 0 + «i Pt + ot 2 i t + u i t (19.2.12) 

Supply function: Q) = Po + Pt Pt + u 2t (19.2.13) 

In this model Q and P are endogenous and / is exogenous. Applying the order condition 
given in Eq. (19.3.1), we see that the demand function is unidentified. On the other hand, 
the supply function is just identified because it excludes exactly M— 1 = 1 variable l t . 


5 The term order refers to the order of a matrix, that is, the number of rows and columns present in a 
matrix. See Appendix B. 
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EXAMPLE 19.3 Demand function: Qf = a 0 + oc\P t + a 2 l t + uit (19.2.12) 

Supply function: Q s t = P o+ faPt + p 2 Pt-i + tht (19.2.22) 

Given that P t and Q t are endogenous and l t and Pt- i are predetermined, Eq. (19.2.12) 
excludes exactly one variable P t _i and Eq. (19.2.22) also excludes exactly one variable / t . 
Hence each equation is identified by the order condition. Therefore, the model as a whole 
is identified. 


EXAMPLE 19.4 Demand function: = a 0 +oc\P t +ot 2 l t + affit + uu (19.2.28) 

Supply function: Q s t = fi 0 + + p 2 P t -i + u 2t (19.2.22) 

In this model P t and Q t are endogenous and It, Rt, and P t ~-\ are predetermined. The demand 
function excludes exactly one variable P t _i, and hence by the order condition it is exactly 
identified. But the supply function excludes two variables l t and R t , and hence it is overi¬ 
dentified. As noted before, in this case there are two ways of estimating f}-\, the coefficient 
of the price variable. 

Notice a slight complication here. By the order condition the demand function is iden¬ 
tified. But if we try to estimate the parameters of this equation from the reduced-form 
coefficients given in Eq. (19.2.31), the estimates will not be unique because p-\, which 
enters into the computations, takes two values and we shall have to decide which of these 
values is appropriate. But this complication can be obviated because it is shown in Chap¬ 
ter 20 that in cases of overidentification the method of indirect least squares is not appro¬ 
priate and should be discarded in favor of other methods. One such method is two-stage 
least squares, which we shall discuss fully in Chapter 20. 


As the previous examples show, identification of an equation in a model of simultaneous 
equations is possible if that equation excludes one or more variables that are present else¬ 
where in the model. This situation is known as the exclusion (of variables) criterion, or the 
zero restrictions criterion (the coefficients of variables not appearing in an equation are 
assumed to have zero values). This criterion is by far the most commonly used method of 
securing or determining identification of an equation. But notice that the zero restrictions 
criterion is based on a priori or theoretical expectations that certain variables do not appear 
in a given equation. It is up to the researcher to spell out clearly why he or she does expect 
certain variables to appear in some equations and not in others. 

The Rank Condition of Identifiability 6 

The order condition discussed previously is a necessary but not sufficient condition for iden¬ 
tification; that is, even if it is satisfied, it may happen that an equation is not identified. Thus, 
in Example 19.2, the supply equation was identified by the order condition because it 
excluded the income variable I t , which appeared in the demand function. But identification 
is accomplished only if a 2 , the coefficient of I t in the demand function, is not zero, that is, 
if the income variable not only probably but actually does enter the demand function. 

More generally, even if the order condition K — k > m — lis satisfied by an equation, it 
may be unidentified because the predetermined variables excluded from this equation but 
present in the model may not all be independent so that there may not be one-to-one corre¬ 
spondence between the structural coefficients (the /Ts) and the reduced-form coefficients 

6 The term rank refers to the rank of a matrix and is given by the largest-order square matrix 
(contained in the given matrix) whose determinant is nonzero. Alternatively, the rank of a matrix is 
the largest number of linearly independent rows or columns of that matrix. See Appendix B. 
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(the ITs). That is, we may not be able to estimate the structural parameters from the reduced- 
form coefficients, as we shall show shortly. Therefore, we need both a necessary and suffi¬ 
cient condition for identification. This is provided by the rank condition of identification, 
which may he stated as follows: 


Rank Condition In a model containing M equations in M endogenous variables, an equation is identified if 
of Identification and only if at least one nonzero determinant of order (M - 1 )(M - 1) can be constructed 

from the coefficients of the variables (both endogenous and predetermined) excluded 
from that particular equation but included in the other equations of the model. 


TABLE 19.1 


TABLE 19.2 


As an illustration of the rank condition of identification, consider the following hypo¬ 
thetical system of simultaneous equations in which the 7 variables are endogenous and the 
X variables are predetermined. 7 

Yu — P 10 — PnYit — PnYst — YnXu =u\ t 

(19.3.2) 

Y2t ~ Pit) ~ @23 Y^ t — /21 X\ t — Y2lX2t = U2t 

(19.3.3) 

Yst ~ Pso — p3\Y\t — YnX u — Y3 2 X 2 t = «3< 

(19.3.4) 

Y» ~ Pm ~ P41 Y\ t - p 42 Y 2t -Y43X3, = «4r 

(19.3.5) 

To facilitate identification, let us write the preceding system in Table 19.1, which is self- 
explanatory. 

Let us first apply the order condition of identification, as shown in Table 19.2. By the 
order condition each equation is identified. Let us recheck with the rank condition. Con¬ 
sider the first equation, which excludes variables 74, X2, and X3 (this is represented by 
zeros in the first row of Table 19.1). For this equation to be identified, we must obtain at 


Coefficients of the Variables 


Equation No. 

1 

Yi 

y 2 

Y3 

Y 4 

*1 

*2 

X 3 

(19.3.2) 

—P10 

1 

-Pm 

~Pn 

0 

-Yu 

0 

0 

(19.3.3) 

-P20 

0 

1 

-P23 

0 

-721 

-722 

0 

(19.3.4) 

~Pi 0 

-fti 

0 

1 

0 

~Y31 

-732 

0 

(19.3.5) 

-P40 

—p41 

-P42 

0 

1 

0 

0 

-743 


Equation No. 

No. of Predetermined 
Variables Excluded, 

(■ K-k) 

No.of Endogenous 
Variables Included, 
Less One, (m - 1) 

Identified? 

(19.3.2) 

2 

2 

Exactly 

(19.3.3) 

1 

1 

Exactly 

(19.3.4) 

1 

1 

Exactly 

(19.3.5) 

2 

2 

Exactly 


7 The simultaneous-equation system presented in Eq. (19.1.1) may be shown in the following 
alternative form, which may be convenient for matrix manipulations. 
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least one nonzero determinant of order 3x3 from the coefficients of the variables excluded 
from this equation but included in other equations. To obtain the determinant we first obtain 
the relevant matrix of coefficients of variables Y 4 , X2, and X 3 included in the other equa¬ 
tions. In the present case there is only one such matrix, call it A, defined as follows: 



It can be seen that the determinant of this matrix is zero: 

1 0 -Y2 2 0 I 

det A = 0 -y 32 0 (19.3.7) 

I 1 0 -y 43 I 

Since the determinant is zero, the rank of the matrix (19.3.6), denoted by p(A), is less than 3. 
Therefore, Eq. (19.3.2) does not satisfy the rank condition and hence is not identified. 

As noted, the rank condition is both a necessary and sufficient condition for identifica¬ 
tion. Therefore, although the order condition shows that Eq. (19.3.2) is identified, the rank 
condition shows that it is not. Apparently, the columns or rows of the matrix A given in 
Eq. (19.3.6) are not (linearly) independent, meaning that there is some relationship between 
the variables Y4, X 3 , and A3. As a result, we may not have enough information to estimate 
the parameters of equation (19.3.2); the reduced-form equations for the preceding model 
will show that it is not possible to obtain the structural coefficients of that equation from the 
reduced-form coefficients. The reader should verify that by the rank condition Eqs. (19.3.3) 
and (19.3.4) are also unidentified but Eq. (19.3.5) is identified. 

As the preceding discussion shows, the rank condition tells us whether the equation 
under consideration is identified or not, whereas the order condition tells us if it is exactly 
identified or overidentified. 

To apply the rank condition one may proceed as follows: 

1. Write down the system in a tabular form, as shown in Table 19.1. 

2. Strike out the coefficients of the row in which the equation under consideration appears. 

3. Also strike out the columns corresponding to those coefficients in step (2) which are 
nonzero. 

4. The entries left in the table will then give only the coefficients of the variables included 
in the system but not in the equation under consideration. From these entries form all 
possible matrices, like A, of order M — 1 and obtain the corresponding determinants. If 
at least one nonvanishing or nonzero determinant can be found, the equation in question 
is (just or over-) identified. The rank of the matrix, say, A, in this case is exactly equal 
to M — 1. If all the possible ( M — 1)( M — 1) determinants are zero, the rank of the ma¬ 
trix A is less than M — 1 and the equation under investigation is not identified. 

Our discussion of the order and rank conditions of identification leads to the following 
general principles of identifiability of a structural equation in a system of M simultaneous 
equations: 


1. If K — k > m - 1 and the rank of the A matrix is M — 1, the equation is overidentified. 

2. \fK—k = m—'\ and the rank of the matrix A is M - 1, the equation is exactly identified. 

3. If K— k > m — 1 and the rank of the matrix A is less than M — the equation is 
underidentified. 

4. If K — k < m — 1, the structural equation is unidentified. The rank of the A matrix in 
this case is bound to be less than M — 1. (Why?) 
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Henceforth, when we talk about identification we mean exact identification or overidentifi¬ 
cation. There is no point in considering unidentified, or underidentified, equations because no 
matter how extensive the data, the structural parameters cannot be estimated. Besides, most 
simultaneous-equation systems in economics and finance are overidentified rather than under¬ 
identified, so we need not worry too much about underidentification. However, as shown in 
Chapter 20, parameters of overidentified as well as just identified equations can be estimated. 

Which condition should one use in practice: Order or rank? For large simultaneous- 
equation models, applying the rank condition is a formidable task. Therefore, as Harvey notes, 

Fortunately, the order condition is usually sufficient to ensure identifiability, and although it is 
important to be aware of the rank condition, a failure to verify it will rarely result in disaster. 8 

*19.4 A Test of Simultaneity 9 

If there is no simultaneous equation, or simultaneity problem, the OLS estimators produce 
consistent and efficient estimators. On the other hand, if there is simultaneity, OLS 
estimators are not even consistent. In the presence of simultaneity, as we will show in Chap¬ 
ter 20, the methods of two-stage least squares (2SLS) and instrumental variables (IV) 
will give estimators that are consistent and efficient. Oddly, if we apply these alternative 
methods when there is in fact no simultaneity, these methods yield estimators that are con¬ 
sistent but not efficient (i.e., with smaller variance). This discussion suggests that we should 
check for the simultaneity problem before we discard OLS in favor of the alternatives. 

As we showed earlier, the simultaneity problem arises because some of the regressors are 
endogenous and are therefore likely to be correlated with the disturbance, or error, term. 
Therefore, a test ofsimultaneity is essentially a test of whether (an endogenous) regressor is 
correlated with the error term. If it is, the simultaneity problem exists, in which case alter¬ 
natives to OLS must he found; if it is not, we can use OLS. To find out which is the case in 
a concrete situation, we can use Hausman’s specification error test. 

Hausman Specification Test 

A version of the Hausman specification error test that can be used for testing the simul¬ 
taneity problem can be explained as follows: 10 

To fix ideas, consider the following two-equation model: 

Demand function: gf = «o + ct\ P, + a 2 f + ci 2 Rt + u \t (19.4.1) 

Supply function: Q s t = P o + P\P t + u 2t (19.4.2) 

where P — price 

Q — quantity 
/ = income 
R = wealth 
u’s = error terms 

Assume that / and R are exogenous. Of course, P and Q are endogenous. 

‘Optional. 

8 Andrew Harvey, The Econometric Analysis of Time Series, 2d ed., The MIT Press, Cambridge, Mass., 
1990, p. 328. 

9 The following discussion draws from Robert S. Pindyckand Daniel L. Rubinfeld, Econometric Models 
and Economic Forecasts, 3d ed., McGraw-Hill, New York, 1991, pp. 303-305. 

10 ). A. Hausman, "Specification Tests in Econometrics," Econometrica, vol. 46, November 1976, 
pp. 1251-1271. See also A. Nakamura and M. Nakamura, "On the Relationship among Several 
Specification Error Tests Presented by Durbin, Wu, and Hausman," Econometrica, vol. 49, November 
1981, pp. 1583-1588. 
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Now consider the supply function (19.4.2). If there is no simultaneity problem (i.e., P 
and Q are mutually independent), P t and u 2t should be uncorrelated (why?). On the other 
hand, if there is simultaneity, P, and u 2t will be correlated. To find out which is the case, the 
Hausman test proceeds as follows: 

First, from Eqs. (19.4.1) and (19.4.2) we obtain the following reduced-form equations: 

p, = n 0 + nj, + n 2 R t + Vf (19.4.3) 

q, — n 3 + n 4 / ( + n 5 R t + w t (19.4.4) 

where v and w are the reduced-form error terms. Estimating Eq. (19.4.3) by OLS we obtain 

P t = n 0 + n 1 / f + fi 2 p / (19.4.5) 

Therefore, 

PtmPt + Vt (19.4.6) 

where P t are estimated P t and v, are the estimated residuals. Now consider the following 
equation: 

Qt = bo + Pt + Piv t + u 2t (19.4.7) 

Note: The coefficients of P, and v t are the same. The difference between this equation and 
the original supply equation is that it includes the additional variable v t , the residual from 
regression (19.4.3). 

Now, if the null hypothesis is that there is no simultaneity, that is, P t is not an endogenous 
variable, the correlation between v t and u 2t should be zero, asymptotically. Thus, if we run the 
regression (19.4.7) and find that the coefficient of v t in Eq. (19.4.7) is statistically zero, we can 
conclude that there is no simultaneity problem. Of course, this conclusion will be reversed if 
we find this coefficient to be statistically significant. In passing, note that Hausman’s simul¬ 
taneity test is also known as the Hausman test of endogeneity: In the present example we want 
to find out if P, is endogenous. If it is, we have the simultaneity problem. 

Essentially, then, the Hausman test involves the following steps: 

Step 1. Regress P, on I, and R t to obtain v t . 

Step 2. Regress Q, on P, and v, and perform a t test on the coefficient of v t ■ If it is sig¬ 
nificant, do not reject the hypothesis of simultaneity; otherwise, reject it. 11 For efficient 
estimation, however, Pindyck and Rubinfeld suggest regressing Q, on P, and v,. 12 

There are alternative ways to apply the Hausman test, which are given by way of an 
exercise. 

EXAMPLE 19.5 

Pindyck- 
Rubinfeld Model 
ofPublic 

Spending 13 

To study the behavior of U.S. state and local government expenditure, the authors devel¬ 
oped the following simultaneous-equation model: 

EXP = f, + fcAID + ftINC + p 4 POP + U, (19.4.8) 

AID = .St + <5 2 EXP + <5 3 PS + vi (19.4.9) 

where EXP = state and local government public expenditures 

AID = level of federal grants-in-aid 

INC = income of states 

POP = state population 

PS = population of primary and secondary school children 
u and v = error terms 

In this model, INC, POP, and PS are regarded as exogenous. 

"If more than one endogenous regressor is involved, we will have to use the Ftest. 

"Pindyck and Rubinfeld, op. cit., p. 304. Note: The regressor is P t and not P t . 

"Pindyck and Rubinfeld, op. cit., pp. 176-177. Notations slightly altered. 
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EXAMPLE 19.5 Because of the possibility of simultaneity between EXP and AID, the authors first regress 

(Continued) AID on anc * P ^ t * ie re duced-form regression). Let the error term in this 

regression be w/. From this regression the calculated residual is w-,. The authors then 
regress EXP on AID, INC, POP, and w,, to obtain the following results: 

EXP = -89.41 + 4.50AID+ 0.0001 3INC- 0.518POP- 1.39i% 

t = (—1.04) (5.89) (3.06) (-4.63) (-1.73) (19.4.10) 14 

R 2 = 0.99 

At the 5 percent level of significance, the coefficient of w; is not statistically significant, and 
therefore, at this level, there is no simultaneity problem. However, at the 10 percent level 
of significance, it is statistically significant, raising the possibility that the simultaneity 
problem is present. 

Incidentally, the OLS estimation of Eq. (19.4.8) is as follows: 

EXP = -46.81 + 3.24AID + 0.00019INC- 0.597POP 

t = (—0.56) (13.64) (8.12) (-5.71) (19.4.11) 

R 2 = 0.993 

Notice an interesting feature of the results given in Eqs. (19.4.10) and (19.4.11): When 
simultaneity is explicitly taken into account, the AID variable is less significant although 
numerically it is greater in magnitude. 


19.5 Tests for Exogeneity 

We noted earlier that it is the researcher’s responsibility to specify which variables are 
endogenous and which are exogenous. This will depend on the problem at hand and the a 
priori information the researcher has. But is it possible to develop a statistical test of 
exogeneity, in the manner of Granger’s causality test? 

The Hausman test discussed in Section 19.4 can be utilized to answer this question. Sup¬ 
pose we have a three-equation model in three endogenous variables, Y\, Y2, and Y 3 , and 
suppose there are three exogenous variables, X\, X2, and X 3 . Further, suppose that the first 
equation of the model is 


Yu =Po + PiYn + p 3 Y 3i + a\X\j + u u ( 19 . 5 . 1 ) 

If Y 2 and Y 3 are truly endogenous, we cannot estimate Eq. (19.5.1) by OLS (why?). But 
how do we find that out? We can proceed as follows. We obtain the reduced-form equations 
for Y 2 and Y 3 (Note: the reduced-form equations will have only predetermined variables on 
the right-hand side). From these reduced-form equations, we obtain Y 2i and Y 3i , the pre¬ 
dicted values of Y 2i and Y 3i , respectively. Then in the spirit of the Hausman test discussed 
earlier, we can estimate the following equation by OLS: 

Y u = p (} + p 2 Y 2i + p 3 Y 3i + a x X u + X 2 Y 2l + k 3 f 3i + u u ( 19 . 5 . 2 ) 

Using the F test, we test the hypothesis that X 2 =X 3 = 0. If this hypothesis is rejected, Y 2 
and Y 3 can be deemed endogenous, but if it is not rejected, they can be treated as exoge¬ 
nous. For a concrete example, see Exercise 19.16. 

‘Optional. 

14 As in footnote 12, the authors use AID rather than AID as the regressor. 
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Summary and 
Conclusions 


EXERCISES 


1. The problem of identification precedes the problem of estimation. 

2. The identification problem asks whether one can obtain unique numerical estimates of 
the structural coefficients from the estimated reduced-form coefficients. 

3. If this can be done, an equation in a system of simultaneous equations is identified. If 
this cannot be done, that equation is un- or under-identified. 

4. An identified equation can be just identified or overidentified. In the former case, 
unique values of structural coefficients can be obtained; in the latter, there may be 
more than one value for one or more structural parameters. 

5. The identification problem arises because the same set of data may be compatible with 
different sets of structural coefficients, that is, different models. Thus, in the regression 
of price on quantity only, it is difficult to tell whether one is estimating the supply func¬ 
tion or the demand function, because price and quantity enter both equations. 

6. To assess the identifiability of a structural equation, one may apply the technique of 
reduced-form equations, which expresses an endogenous variable solely as a function 
of predetermined variables. 

7. However, this time-consuming procedure can be avoided by resorting to either the order 
condition or the rank condition of identification. Although the order condition is easy to 
apply, it provides only a necessary condition for identification. On the other hand, the rank 
condition is both a necessary and sufficient condition for identification. If the rank condi¬ 
tion is satisfied, the order condition is satisfied, too, although the converse is not true. In 
practice, though, the order condition is generally adequate to ensure identifiability. 

8. In the presence of simultaneity, OLS is generally not applicable, as was shown in 
Chapter 18. But if one wants to use it nonetheless, it is imperative to test for simul¬ 
taneity explicitly. The Hausman specification test can be used for this purpose. 

9. Although in practice deciding whether a variable is endogenous or exogenous is a 
matter of judgment, one can use the Hausman specification test to determine whether 
a variable or group of variables is endogenous or exogenous. 

10. Although they are in the same family, the concepts of causality and exogeneity are dif¬ 
ferent and one may not necessarily imply the other. In practice it is better to keep those 
concepts separate (see Section 17.14). 


Questions 

19.1. Show that the two definitions of the order condition of identification (see Sec¬ 
tion 19.3) are equivalent. 

19.2. Deduce the structural coefficients from the reduced-form coefficients given in 
Eqs. (19.2.25) and (19.2.27). 

19.3. Obtain the reduced form of the following models and determine in each case whether 
the structural equations are unidentified, just identified, or overidentified: 

a. Chap. 18, Example 18.2. 

b. Chap. 18, Example 18.3. 

c. Chap. 18, Example 18.6. 

19.4. Check the identifiability of the models of Exercise 19.3 by applying both the order 
and rank conditions of identification. 

19.5. In the model (19.2.22) of the text it was shown that the supply equation was overi¬ 
dentified. What restrictions, if any, on the structural parameters will make this 
equation just identified? Justify the restrictions you impose. 
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TABLE 19.3 


19.6. From the model 

Yy — Pio + PnY 2 t + YnXu + uy 
Yu — P 2 o + Pi\ Y\ t + YnXit + U2t 
the following reduced-form equations are obtained: 

Y\t — nio + rijiJfu + TiuX2t + Wt 

Yit — n 20 + n 2 iXu + TI22X2, + v t 

a. Are the structural equations identified? 

b. What happens to identification if it is known a priori that Yn =0? 

19.7. Refer to Exercise 19.6. The estimated reduced-form equations are as follows: 

Y u = 4 + 3 Xu + SX 2t 
Y lt = 2 + 6 X lt + 10X 2( 

a. Obtain the values of the structural parameters. 

b. How would you test the null hypothesis that yn =0? 

19.8. The model 

Y\ t — P 10 + PnY 2 t + YnX lt + u\ t 
Y2t — P 20 + Pn Y\ t + U2t 
produces the following reduced-form equations: 

Y lt = 4+SXu 
Y 2t = 2 + \2Xu 

a. Which structural coefficients, if any, can be estimated from the reduced-form 
coefficients? Demonstrate your contention. 

b. How does the answer to ( a ) change if it is known a priori that (1) /h 2 = 0 and 
(2) £10 = 0? 

19.9. Determine whether the structural equations of the model given in Exercise 18.8 are 
identified. 

19.10. Refer to Exercise 18.7 and find out which structural equations can be identified. 

19.11. Table 19.3 is a model in five equations with five endogenous variables Y and four 
exogenous variables X: 


Coefficients of the Variables 


Equation No. 

Yt 

y 2 

Y s 

y 4 

Ys 


*2 

Xb 

*4 

1 

1 

Pm 

0 

Pu 

0 

n 

0 

0 

714 

2 

0 

1 

P23 

P24 

0 

0 

722 

723 

0 

3 


0 

1 

Pi 4 

Pis 

0 

0 

733 

734 

4 

0 

Pa2 

0 

1 

0 

m 

0 

743 

0 

5 

Pst 

0 

0 

PS4 

1 

0 

752 

753 

0 


Determine the identifiability of each equation with the aid of the order and rank 
conditions of identifications. 

19.12. Consider the following extended Keynesian model of income determination: 


Consumption function: 
Investment function: 
Taxation function: 
Income identity: 


C t — Pi + p 2 Y t — p 2 T, + Uy 
I t = do + OtiY t -\ + U2t 
T t = Yo + Y\Y t + u 3t 
Y t = C t + I, + G, 
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where C = consumption expenditure 
Y = income 
I = investment 
T = taxes 

G — government expenditure 
w’s = the disturbance terms 

In the model the endogenous variables are C, I, T, and Y and the predetermined vari¬ 
ables are G and 1ft i . 

By applying the order condition, check the identifiability of each of the equa¬ 
tions in the system and of the system as a whole. What would happen if r t , the in¬ 
terest rate, assumed to be exogenous, were to appear on the right-hand side of the 
investment function? 

19.13. Refer to the data given in Table 18.1 of Chapter 18. Using these data, estimate the 
reduced-form regressions (19.1.2) and (19.1.4). Can you estimate ft and ft? Show 
your calculations. Is the model identified? Why or why not? 

19.14. Suppose we propose yet another definition of the order condition of identifiability: 

K>m + k- 1 

which states that the number of predetermined variables in the system can be no 
less than the number of unknown coefficients in the equation to be identified. Show 
that this definition is equivalent to the two other definitions of the order condition 
given in the text. 

19.15. A simplified version of Suits’s model of the watermelon market is as follows:* 

Demand equation: P t = a o + ot\(Q t /Nt) + oi2( Y t /N t ) + 0:3 F, + u\ t 

Crop supply function: (ft = ft + ft {Pt/ ft) + ft ft- i + ftC,_i + ft 7ft i + u 2t 
where P — price 

(Q/N) = per capita quantity demanded 
(Y/N) = per capita income 
ft = freight costs 

(P/W) = price relative to the farm wage rate 
C = price of cotton 
T — price of other vegetables 
N = population 

P and Q are the endogenous variables. 

a. Obtain the reduced form. 

b. Determine whether the demand, the supply, or both functions are identified. 

Empirical Exercises 

19.16. Consider the following demand-and-supply model for money: 

Money demand: Mf — ft + ft Y t + ft ft + ft ft + u\ t 
Money supply: Aft = «o + «i U + u 2 t 


‘D. B. Suits, "An Econometric Model of the Watermelon Market," lournal of Farm Economics, vol. 37, 
1955, pp. 237-251. 
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TABLE 19.4 

Money, GDP, Interest 

Observation 

m 2 

GDP 

TBRATE 

CPI 

Rate, and Consumer 

1970 

626.5 

3,771.9 

6.458 

38.8 

Price Index, United 

1971 

710.3 

3,898.6 

4.348 

40.5 

States, 1970-2006 

1972 

802.3 

4,105.0 

4.071 

41.8 


1973 

855.5 

4,341.5 

7.041 

44.4 

Source: Economic Report of the 

1974 

902.1 

4,319.6 

7.886 

49.3 

B-60, B-69.B-73. 

1975 

1,016.2 

4,311.2 

5.838 

53.8 


1976 

1,152.0 

4,540.9 

4.989 

56.9 


1977 

1,270.3 

4,750.5 

5.265 

60.6 


1978 

1,366.0 

5,015.0 

7.221 

65.2 


1979 

1,473.7 

5,173.4 

10.041 

72.6 


1980 

1,599.8 

5,161.7 

11.506 

82.4 


1981 

1,755.5 

5,291.7 

14.029 

90.9 


1982 

1,910.1 

5,189.3 

10.686 

96.5 


1983 

2,126.4 

5,423.8 

8.63 

99.6 


1984 

2,309.8 

5,813.6 

9.58 

103.9 


1985 

2,495.5 

6,053.7 

7.48 

107.6 


1986 

2,732.2 

6,263.6 

5.98 

109.6 


1987 

2,831.3 

6,475.1 

5.82 

113.6 


1988 

2,994.3 

6,742.7 

6.69 

118.3 


1989 

3,158.3 

6,981.4 

8.12 

124.0 


1990 

3,277.7 

7,112.5 

7.51 

130.7 


1991 

3,378.3 

7,100.5 

5.42 

136.2 


1992 

3,431.8 

7,336.6 

3.45 

140.3 


1993 

3,482.5 

7,532.7 

3.02 

144.5 


1994 

3,498.5 

7,835.5 

4.29 

148.2 


1995 

3,641.7 

8,031.7 

5.51 

152.4 


1996 

3,820.5 

8,328.9 

5.02 

156.9 


1997 

4,035.0 

8,703.5 

5.07 

160.5 


1998 

4,381.8 

9,066.9 

4.81 

163.0 


1999 

4,639.2 

9,470.3 

4.66 

166.6 


2000 

4,921.7 

9,81 7.0 

5.85 

172.2 


2001 

5,433.5 

9,890.7 

3.45 

177.1 


2002 

5,779.2 

10,048.8 

1.62 

179.9 


2003 

6,071.2 

10,301.0 

1.02 

184.0 


2004 

6,421.6 

10,675.8 

1.38 

188.9 


2005 

6,691.7 

11,003.4 

3.16 

195.3 


2006 

7,035.5 

11,319.4 

4.73 

201.6 


•s: M 2 = M 2 Money supply (billions of dollars). 

GDP = gross domestic product (billions of dollars). 
TBRATE = 3-month Treasury bill rate, %. 

CPI = Consumer Price Index (1982-1984 = 100). 


where M — money 
Y = income 
R — rate of interest 
P = price 
u’s = error terms 

Assume that R and P are exogenous and M and Y are endogenous. Table 19.4 gives 
data on M (M 2 definition), Y (GDP), R (3-month Treasury bill rate) and P (Con¬ 
sumer Price Index), for the United States for 1970-2006. 
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a. Is the demand function identified? 

b. Is the supply function identified? 

c. Obtain the expressions for the reduced-form equations for M and Y. 

d. Apply the test of simultaneity to the supply function. 

e. How would we find out if 7 in the money supply function is in fact endogenous? 

19.17. The Hausman test discussed in the text can also be conducted in the following way. 

Consider Eq. (19.4.7): 

Qt = A) + PlPt + PlVt + U2t 

a. Since P t and v t have the same coefficients, how would you test that in a given 
application that is indeed the case? What are the implications of this? 

b. Since P, is uncorrelated with «2r by design (why?), one way to find out if P, is 
exogenous is to see if v, is correlated with U2t- How would you go about testing 
this? Which test do you use? (Hint: Substitute P, from [19.4.6] intoEq. [19.4.7].) 



Chapter 


Simultaneous-Equation 

Methods 


Having discussed the nature of the simultaneous-equation models in the previous two chap¬ 
ters, in this chapter we turn to the problem of estimation of the parameters of such models. 
At the outset it may be noted that the estimation problem is rather complex because there 
are a variety of estimation techniques with varying statistical properties. In view of the in¬ 
troductory nature of this text, we shall consider only a few of these techniques. Our discus¬ 
sion will be simple and often heuristic, the finer points being left to the references. 

20.1 Approaches to Estimation 

If we consider the general M equations model in M endogenous variables given in Eq. (19.1.1), 
we may adopt two approaches to estimate the structural equations, namely, single-equation 
methods, also known as limited information methods, and system methods, also known 
as full information methods. In the single-equation methods to be considered shortly, we 
estimate each equation in the system (of simultaneous equations) individually, taking into 
account any restrictions placed on that equation (such as exclusion of some variables) with¬ 
out worrying about the restrictions on the other equations in the system, 1 hence the name 
limited information methods. In the system methods, on the other hand, we estimate all the 
equations in the model simultaneously, taking due account of all restrictions on such equa¬ 
tions by the omission or absence of some variables (recall that for identification such 
restrictions are essential), hence the name full information methods. 

As an example, consider the following four-equations model: 

Y\t = P 10 + + P\iYit+ fittYit + + YnX\t+ + u\t 

Yit= P 20 + + PiiYit + Vn-Yu + 722^21 + u 2t 

Y-it = Pso + PnY\t+ + PmY a , + Yi\X\ t + Y32^2t + + «3t 

Yi,t = $40 + + $42l2f + /43^3( + *<4< 

( 20 . 1 . 1 ) 


'For the purpose of identification, however, information provided by other equations will have to be 
taken into account. But as noted in Chapter 19, estimation is possible only in the case of (fully or 
over-) identified equations. In this chapter we assume that the identification problem is solved using 
the techniques of Chapter 19. 
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where the 7’s are the endogenous variables and theX’s are the exogenous variables. If we 
are interested in estimating, say, the third equation, the single-equation methods will con¬ 
sider this equation only, noting that variables Y 2 and X 3 are excluded from it. In the systems 
methods, on the other hand, we try to estimate all four equations simultaneously, taking into 
account all the restrictions imposed on the various equations of the system. 

To preserve the spirit of simultaneous-equation models, ideally one should use the sys¬ 
tems method, such as the full information maximum likelihood (FIML) method. * 2 In 
practice, however, such methods are not commonly used for a variety of reasons. First, the 
computational burden is enormous. For example, the comparatively small (20 equations) 
1955 Klein-Goldberger model of the U.S. economy had 151 nonzero coefficients, of which 
the authors estimated only 51 coefficients using the time series data. The Brookings-Social 
Science Research Council (SSRC) econometric model of the U.S. economy published in 
1965 initially had 150 equations. 3 Although such elaborate models may furnish finer details 
of the various sectors of the economy, the computations are a stupendous task even in these 
days of high-speed computers, not to mention the cost involved. Second, the systems meth¬ 
ods, such as FIML, lead to solutions that are highly nonlinear in the parameters and are 
therefore often difficult to determine. Third, if there is a specification error (say, a wrong 
functional form or exclusion of relevant variables) in one or more equations of the system, 
that error is transmitted to the rest of the system. As a result, the systems methods become 
very sensitive to specification errors. 

In practice, therefore, single-equation methods are often used. As Klein puts it, 

Single equation methods, in the context of a simultaneous system, may be less sensitive to 
specification error in the sense that those parts of the system that are correctly specified may 
not be affected appreciably by errors in specification in another part. 4 

In the rest of the chapter we shall deal with single-equation methods only. Specifically, 
we shall discuss the following single-equation methods: 

1. Ordinary least squares (OLS) 

2. Indirect least squares (ILS) 

3. Two-stage least squares (2SLS) 

20.2 Recursive Models and Ordinary Least Squares 


We saw in Chapter 18 that, because of the interdependence between the stochastic distur¬ 
bance term and the endogenous explanatory variable(s), the OLS method is inappropriate 
for the estimation of an equation in a system of simultaneous equations. If applied erro¬ 
neously, then, as we saw in Section 18.3, the estimators are not only biased (in small sam¬ 
ples) but also inconsistent; that is, the bias does not disappear no matter how large the 
sample size. There is, however, one situation where OLS can be applied appropriately even 
in the context of simultaneous equations. This is the case of the recursive, triangular, or 


2 For a simple discussion of this method, see Carl F. Christ, Econometric Models and Methods, John 
Wiley & Sons, New York, 1966, pp. 395-101. 

3 James S. Duesenberry, Cary Fromm, Lawrence R. Klein, and Edwin Kuh, eds., A Quarterly Model of the 
United States Economy, Rand McNally, Chicago, 1965. 

4 Lawrence R. Klein, A Textbook of Econometrics, 2d ed., Prentice Hall, Englewood Cliffs, NJ, 1974, 
p. 150. 
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FIGURE 20.1 

Recursive model. 


causal models. To see the nature of these models, consider the following three-equation 
system: 

Y\t = /ho + YnX u + yi 2 X 2t + u\t 

Yit = /ho + Pi\Y\ t + YnXu + ynXt + u 2t (20.2.1) 

Y-it = /*30 + fti Y\ t + /032 Y 2t + YnX\ t + y 32 X 2t + u 3t 

where, as usual, the 7’s and the X’s are, respectively, the endogenous and exogenous 

variables. The disturbances are such that 

cov(u u , u 2t ) = cov(mk, u 3t ) = cov (u 2t , u 3t ) = 0 

that is, the same-period disturbances in different equations are uncorrelated (technically, this 
is the assumption of zero contemporaneous correlation). 

Now consider the first equation of (20.2.1). Since it contains only the exogenous vari¬ 
ables on the right-hand side and since by assumption they are uncorrelated with the distur¬ 
bance term u\ t , this equation satisfies the critical assumption of the classical OLS, namely, 
uncorrelatedness between the explanatory variables and the stochastic disturbances. 
Hence, OLS can be applied straightforwardly to this equation. Next consider the second 
equation of (20.2.1), which contains the endogenous variable 7 as an explanatory variable 
along with the nonstochastic Xs. Now OLS can also be applied to this equation, provided 
Y\ t and u 2t are uncorrelated. Is this so? The answer is yes because u\, which affects Y\, is by 
assumption uncorrelated with u 2 . Therefore, for all practical purposes, Y\ is a predeter¬ 
mined variable insofar as Y 2 is concerned. Hence, one can proceed with OLS estimation of 
this equation. Carrying this argument a step further, we can also apply OLS to the third 
equation in (20.2.1) because both Y\ and 72 are uncorrelated with u 3 . 

Thus, in the recursive system OLS can be applied to each equation separately. Actually, we 
do not have a simultaneous-equation problem in this situation. From the structure of such 
systems, it is clear that there is no interdependence among the endogenous variables. Thus, Y\ 
affects 72, but 72 does not affect Y\. Similarly, Y\ and 72 influence 73 without, in turn, being 
influenced by 73. In other words, each equation exhibits a unilateral causal dependence, hence 
the name causal models. 5 Schematically, we have Figure 20.1. 


(X,,X 2 ) 


5 The alternative name triangular stems from the fact that if we form the matrix of the coefficients of 
the endogenous variables given in Eq. (20.2.1), we obtain the following triangular matrix: 

Vi y 2 y 3 

Equation iff 0 0 1 

Equation 2 1 0 

Equation 3 [ft, ij 

Note that the entries above the main diagonal are zeros (why?). 
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As an example of a recursive system, one may postulate the following model of wage 
and price determination: 

Price equation: P t — P\o + Pu W t -\ + PuRt + PnM t + p^L, + u\. 

Wage equation: W t = P20 + ftiUN, + PnP t + ujt ( 20 . 2 . 2 ) 

where P = rate of change of price per unit of output 
W = rate of change of wages per employee 
R = rate of change of price of capital 
M = rate of change of import prices 
L = rate of change of labor productivity 
UN = unemployment rate, % 6 

The price equation postulates that the rate of change of price in the current period is a 
function of the rates of change in the prices of capital and of raw material, the rate of 
change in labor productivity, and the rate of change in wages in the previous period. The 
wage equation shows that the rate of change in wages in the current period is determined 
by the current period rate of change in price and the unemployment rate. It is clear that the 
causal chain runs from W t -\ —> Pt —> W t , and hence OLS may be applied to estimate the 
parameters of the two equations individually. 

Although recursive models have proved to be useful, most simultaneous-equation mod¬ 
els do not exhibit such a unilateral cause-and-effect relationship. Therefore, OLS, in gen¬ 
eral, is inappropriate to estimate a single equation in the context of a simultaneous-equation 
model. 7 

There are some who argue that, although OLS is generally inapplicable to simultaneous- 
equation models, one can use it, if only as a standard or norm of comparison. That is, one 
can estimate a structural equation by OLS, with the resulting properties of biasedness, 
inconsistency, etc. Then the same equation may be estimated by other methods especially 
designed to handle the simultaneity problem and the results of the two methods compared, 
at least qualitatively. In many applications the results of the inappropriately applied 
OLS may not differ very much from those obtained by more sophisticated methods, as 
we shall see later. In principle, one should not have much objection to the production of 
the results based on OLS so long as estimates based on alternative methods devised for 
simultaneous-equation models are also given. In fact, this approach might give us some 
idea about how badly OLS does in situations when it is applied inappropriately. 8 


6 Note: The dotted symbol means "time derivative." For example, P = dP/dt. For discrete time series, 
dP/dt is sometimes approximated by AP/Af, where the symbol A is the first difference operator, 
which was originally introduced in Chapter 12. 

7 lt is important to keep in mind that we are assuming that the disturbances across equations are 
contemporaneously uncorrelated. If this is not the case, we may have to resort to the Zellner SURE 
(seemingly unrelated regressions) estimation technique to estimate the parameters of the recursive 
system. See A. Zellner, "An Efficient Method of Estimating Seemingly Unrelated Regressions and Tests 
for Aggregation Bias," journal of the American Statistical Association, vol. 57, 1962, pp. 348-368. 

8 lt may also be noted that in small samples the alternative estimators, like the OLS estimators, are also 
biased. But the OLS estimator has the "virtue" that it has minimum variance among these alternative 
estimators. But this is true of small samples only. 
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20.3 Estimation of a Just Identified Equation: The Method 
of Indirect Least Squares (ILS) 

For a just or exactly identified structural equation, the method of obtaining the estimates of 
the structural coefficients from the OLS estimates of the reduced-form coefficients is known 
as the method of indirect least squares (ILS), and the estimates thus obtained are known 
as the indirect least-squares estimates. ILS involves the following three steps: 

Step 1. We first obtain the reduced-form equations. As noted in Chapter 19, these 
reduced-form equations are obtained from the structural equations in such a manner 
that the dependent variable in each equation is the only endogenous variable and is a 
function solely of the predetermined (exogenous or lagged endogenous) variables and 
the stochastic error term(s). 

Step 2. We apply OLS to the reduced-form equations individually. This operation is 
permissible since the explanatory variables in these equations are predetermined and 
hence uncorrelated with the stochastic disturbances. The estimates thus obtained are 
consistent. 9 

Step 3. We obtain estimates of the original structural coefficients from the estimated 
reduced-form coefficients obtained in Step 2. As noted in Chapter 19, if an equation is 
exactly identified, there is a one-to-one correspondence between the structural and 
reduced-form coefficients; that is, one can derive unique estimates of the former from 
the latter. 

As this three-step procedure indicates, the name ILS derives from the fact that structural 
coefficients (the object of primary enquiry in most cases) are obtained indirectly from the 
OLS estimates of the reduced-form coefficients. 

An Illustrative Example 

Consider the demand-and-supply model introduced in Section 19.2, which for convenience 
is given below with a slight change in notation: 

Demand function: Qt = + ot\Pt + a 2 ^t + u \t (20.3.1) 

Supply function: Qt — Pa + P\Pt + u it (20.3.2) 

where Q — quantity 
P — price 

X = income or expenditure 

Assume that X is exogenous. As noted previously, the supply function is exactly identified 
whereas the demand function is not identified. 

The reduced-form equations corresponding to the preceding structural equations are 
P t = Yl 0 + Yl l X t +w t (20.3.3) 

Q t = n 2 + n 3 X t + v t (20.3.4) 


9 ln addition to being consistent, the estimates "may be best unbiased and/or asymptotically efficient, 
depending respectively upon whether (/) the z's [= X's] are exogenous and not merely predeter¬ 
mined [i.e., do not contain lagged values of endogenous variables] and/or (w) the distribution of the 
disturbances is normal." See W. C. Hood and Tjalling C. Koopmans, Studies in Econometric Method, 
John Wiley & Sons, New York, 1953, p. 133. 
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where the ITs are the reduced-form coefficients and are (nonlinear) combinations of the 
structural coefficients, as shown in Eqs. (19.2.16) and (19.2.18), and where w and v are 
linear combinations of the structural disturbances U\ and w 2 . 

Notice that each reduced-form equation contains only one endogenous variable, which 
is the dependent variable and which is a function solely of the exogenous variable X 
(income) and the stochastic disturbances. Hence, the parameters of the preceding reduced- 
form equations may be estimated by OLS. These estimates are 


n, = £^ 

(20.3.5) 

x>? 


n 0 = p - fux 

(20.3.6) 


(20.3.7) 

n 2 = Q - n 3 x 

(20.3.8) 


where the lowercase letters, as usual, denote deviations from sample means and where Q 
and P are the sample mean values of Q and P. As noted previously, the n, ’s are consistent 
estimators and under appropriate assumptions are also minimum variance unbiased or 
asymptotically efficient (see footnote 9). 

Since our primary objective is to determine the structural coefficients, let us see if we 
can estimate them from the reduced-form coefficients. Now as shown in Section 19.2, the 
supply function is exactly identified. Therefore, its parameters can be estimated uniquely 
from the reduced-form coefficients as follows: 

n 3 

A) = n 2 - 0in o and P\ — — 

Hence, the estimates of these parameters can be obtained from the estimates of the 
reduced-form coefficients as 


p 0 =h 2 -p 1 h 0 (20.3.9) 

#1 = 5 (20.3.10) 

n, 


which are the ILS estimators. Note that the parameters of the demand function cannot be 
thus estimated (however, see Exercise 20.13). 

To give some numerical results, we obtained the data shown in Table 20.1. First we esti¬ 
mate the reduced-form equations, regressing separately price and quantity on per capita 
real consumption expenditure. The results are as follows: 


P, = 90.9601 + 0.0007 
se = (4.0517) (0.0002) 

t= (22.4499) (3.0060) 
Q t = 59.7618 + 0.0020X, 
se = (1.5600) (0.00009) 

t= (38.3080) (20.9273) 


(20.3.11) 

R 2 = (0.2440) 

(20.3.12) 

R 2 = 0.9399 


Using Eqs. (20.3.9) and (20.3.10), we obtain these ILS estimates: 

Po = -183.7043 
P\ = 2.6766 


(20.3.13) 

(20.3.14) 


Chapter 20 Simultaneous-Equation Methods 717 


TABLE 20.1 

Crop Production, 
Crop Prices, and 
per Capita Personal 
Consumption 
Expenditures, 2007 
Dollars, United 
States, 1975-2004 

Source: Economic Report of the 
President, 2007. Data on Q 
(Table B-99), on P 
(Table B-101), andonX 
(Table B-31). 



Index of Crop 

Index of Crop Prices 

Real per Capita 


Production 

Received by Farmers 

Personal Consumption 

Observation 

(1996= 100), Q 

(1990-1992= 100), P 

Expenditure, X 

1975 

66 

88 

4,789 

1976 

67 

87 

5,282 

1977 

71 

83 

5,804 

1978 

73 

89 

6,417 

1979 

78 

98 

7,073 

1980 

75 

107 

7,716 

1981 

81 

111 

8,439 

1982 

82 

98 

8,945 

1983 

71 

108 

9,775 

1984 

81 

111 

10,589 

1985 

85 

98 

11,406 

1986 

82 

87 

12,048 

1987 

84 

86 

12,766 

1988 

80 

104 

1 3,685 

1989 

86 

109 

14,546 

1990 

90 

103 

15,349 

1991 

90 

101 

15,722 

1992 

96 

101 

16,485 

1993 

91 

102 

1 7,204 

1994 

101 

105 

18,004 

1995 

96 

112 

18,665 

1996 

100 

127 

19,490 

1997 

104 

115 

20,323 

1998 

105 

107 

21,291 

1999 

108 

97 

22,491 

2000 

108 

96 

23,862 

2001 

108 

99 

24,722 

2002 

107 

105 

25,501 

2003 

108 

111 

26,463 

2004 

112 

117 

27,937 


Therefore, the estimated ILS regression is 10 

Q t = —183.7043 + 2.6766P t (20.3.15) 

For comparison, we give the results of the (inappropriately applied) OLS regression of 
Q on P: 

Q, = 20.89 + 0.673 P, 

se = (23.04) (0.2246) (20.3.16) 

t= (0.91) (2.99) R 2 = 0.2430 

These results show how OLS can distort the “true” picture when it is applied in inappro¬ 
priate situations. 


10 We have not presented the standard errors of the estimated structural coefficients because, as 
noted previously, these coefficients are generally nonlinear functions of the reduced-form coefficients 
and there is no simple method of estimating their standard errors from the standard errors of the 
reduced-form coefficients. For large-sample size, however, standard errors of the structural 
coefficients can be obtained approximately. For details, see Jan Kmenta, Elements of Econometrics, 
Macmillan, New York, 1971, p. 444. 
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Properties of ILS Estimators 

We have seen that the estimators of the reduced-form coefficients are consistent and under 
appropriate assumptions also best unbiased or asymptotically efficient (see footnote 9). Do 
these properties carry over to the ILS estimators? It can be shown that the ILS estimators 
inherit all the asymptotic properties of the reduced-form estimators, such as consistency 
and asymptotic efficiency. But (the small sample) properties such as unbiasedness do not 
generally hold true. It is shown in Appendix 20A, Section 20A. 1, that the ILS estimators Po 
and Pi of the supply function given previously are biased but the bias disappears as the 
sample size increases indefinitely (that is, the estimators are consistent). 11 

20.4 Estimation of an Overidentified Equation: The Method 
of Two-Stage Least Squares (2SLS) 

Consider the following model: 

Income function: Y u = p w + + PuY 2t + YuX u + y\ 2 X 2t + »n 

(20.4.1) 

Money supply Y 2t = P20 + P21Y\t + u 2t 

function: (20.4.2) 

where Y\ = income 

Y 2 = stock of money 
X\ = investment expenditure 
X 2 = government expenditure on goods and services 

The variables X\ and X 2 are exogenous. 

The income equation, a hybrid of quantity-theory-Keynesian approaches to income de¬ 
termination, states that income is determined by money supply, investment expenditure, and 
government expenditure. The money supply function postulates that the stock of money is 
determined (by the Federal Reserve System) on the basis of the level of income. Obviously, 
we have a simultaneous-equation problem, which can be checked by the simultaneity test 
discussed in Chapter 19. 

Applying the order condition of identification, we can see that the income equation is 
underidentified whereas the money supply equation is overidentified. There is not much 
that can be done about the income equation short of changing the model specification. The 
overidentified money supply function may not be estimated by ILS because there are two 
estimates of p 2 \ (the reader should verify this via the reduced-form coefficients). 

As a matter of practice, one may apply OLS to the money supply equation, but the 
estimates thus obtained will be inconsistent in view of the likely correlation between 
the stochastic explanatory variable Y\ and the stochastic disturbance term u 2 . Suppose, 
however, we find a “proxy” for the stochastic explanatory variable Y\ such that, although 
“resembling” Y\ (in the sense that it is highly correlated with Y{), it is uncorrelated with u 2 . 
Such a proxy is also known as an instrumental variable (see Chapter 17). If one can find 
such a proxy, OLS can be used straightforwardly to estimate the money supply function. 

"Intuitively this can be seen as follows: E{pi) = Pi if £(fT 3 /fli) = (rtB/flT). Now even if 
f(n 3 ) = n 3 and £(fti), = rti, it can be shown that f(n 3 /fti) + f(n 3 )/f (fti); that is, the 
expectation of the ratio of two variables is not equal to the ratio of the expectations of the two 
variables. However, as shown in Appendix 20A.1, plim (ft 3 /fti) = plim(n 3 )/plim(n 1 ) = fls/rH 
since n 3 and fti are consistent estimators. 
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But how does one obtain such an instrumental variable? One answer is provided by the two- 
stage least squares (2SLS), developed independently by Henri Theil 12 and Robert 
Basmann. 13 As the name indicates, the method involves two successive applications of 
OLS. The process is as follows: 

Stage 1. To get rid of the likely correlation between Y\ and w 2 , regress first Y\ on all the 
predetermined variables in the whole system, not just that equation. In the present case, 
this means regressing Y\ on X\ and X 2 as follows: 

Y\ t = n 0 + fl\X\ t + n 2 X 2t + u t (20.4.3) 

where u t are the usual OLS residuals. From Eq. (20.4.3) we obtain 

Yu = n 0 + flxXu + n 2 X 2 , (20.4.4) 

where Y\ t is an estimate of the mean value of Y conditional upon the fixed 2fs. Note 
that Eq. (20.4.3) is nothing but a reduced-form regression because only the exogenous 
or predetermined variables appear on the right-hand side. 

Equation (20.4.3) can now be expressed as 

Yu = Yu + u t (20.4.5) 

which shows that the stochastic Y 1 consists of two parts: Y\ t , which is a linear 
combination of the nonstochastic Xs, and a random component it,. Following the 
OLS theory, Y\ t and u t are uncorrelated. (Why?) 

Stage 2. The overidentified money supply equation can now be written as 

Y-u = P 20 + Pi\ {Y\t + lit) + uit 

= P 20 + PnYu + (u 2t + Pi\Ut) (20.4.6) 

= /02O + Pll Y\t + it* 


where u* = u 2t + Pii&t- 

Comparing Eq. (20.4.6) with Eq. (20.4.2), we see that they are very similar in ap¬ 
pearance, the only difference being that Y\ is replaced by Y \. What is the advantage of 
Eq. (20.4.6)? It can be shown that although Y\ in the original money supply equation is 
correlated or likely to be correlated with the disturbance term w 2 (hence rendering OLS 
inappropriate), Y\ t in Eq. (20.4.6) is uncorrelated with u* asymptotically, that is, in the 
large sample (or more accurately, as the sample size increases indefinitely). As a result, 
OLS can be applied to Eq. (20.4.6), which will give consistent estimates of the para¬ 
meters of the money supply function. 14 

12 Henri Theil, "Repeated Least-Squares Applied to Complete Equation Systems," The Hague: The 
Central Planning Bureau, The Netherlands, 1953 (mimeographed). 

13 Robert L. Basmann, "A Generalized Classical Method of Linear Estimation of Coefficients in a 
Structural Equation," Econometrica, vol. 25, 1957, pp. 77-83. 

14 But note that in small samples ?i t is likely to be correlated with uf. The reason is as follows: From 
Eq. (20.4.4) we see that ?u is a weighted linear combination of the predetermined X's, with ft's as 
the weights. Now even if the predetermined variables are truly nonstochastic, the ft's, being estima¬ 
tors, are stochastic. Therefore, Vj t is stochastic too. Now from our discussion of the reduced-form 
equations and indirect least-squares estimation, it is clear that the reduced-coefficients, the ft's, are 
functions of the stochastic disturbances, such as u 2 . And since ri t depends on the ft's, it is likely to be 
correlated with u 2 , which is a component of uj. As a result, Tit is expected to be correlated with u*. 
But as noted previously, this correlation disappears as the sample size tends to infinity. The upshot of 
all this is that in small samples the 2SLS procedure may lead to biased estimation. 
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As this two-stage procedure indicates, the basic idea behind 2SLS is to “purify” the sto¬ 
chastic explanatory variable Y 1 of the influence of the stochastic disturbance u 2 . This goal 
is accomplished by performing the reduced-form regression of Y t on all the predetermined 
variables in the system (Stage 1), obtaining the estimates Y\ t and replacing Y u in the orig¬ 
inal equation by the estimated Y u , and then applying OLS to the equation thus transformed 
(Stage 2). The estimators thus obtained are consistent; that is, they converge to their true 
values as the sample size increases indefinitely. 

To illustrate 2SLS further, let us modify the income-money supply model as follows: 

Y\t = /ho + PnY 2t + Y\ \X\t + YnXit + u\ t (20.4.7) 

Yit = PiQ + PilYx, + y 2 3X 3t + y 24 X 4t + u 2 , (20.4.8) 

where, in addition to the variables already defined, X 3 = income in the previous time period 
and X 4 = money supply in the previous period. Both X 3 and X 4 are predetermined. 

It can be readily verified that both Eqs. (20.4.7) and (20.4.8) are overidentified. To apply 
2SLS, we proceed as follows: In Stage 1 we regress the endogenous variables on all the 
predetermined variables in the system. Thus, 

Yu = flio + n n X Xt + n 12 X 2( + fl 13 X 3t + n u X 4 , + u u (20.4.9) 

y 2{ = n 20 + n 2 iXi r + n 22 x 2t + n 23 x 3 , + n 2 4 X^ + u 2t ( 20 . 4 . 10 ) 

In Stage 2 we replace Y\ and Y 2 in the original (structural) equations by their estimated val¬ 
ues from the preceding two regressions and then run the OLS regressions as follows: 

fa = Pio + PnY 2t + YnXit + yuX 2l + u* t (20.4.11) 

Y 2t = p 20 + P 21 Y u + y 23 X 3t + y 24 X 4l + u\ t (20.4.12) 

where u* t = u\, + P\ 2 u 2t and u* 2t = u 2t + p 2 \U\ t - The estimates thus obtained will be 
consistent. 

Note the following features of 2SLS. 

1. It can be applied to an individual equation in the system without directly taking into 
account any other equation(s) in the system. Hence, for solving econometric models in¬ 
volving a large number of equations, 2SLS offers an economical method. For this rea¬ 
son the method has been used extensively in practice. 

2. Unlike ILS, which provides multiple estimates of parameters in the overidentified 
equations, 2SLS provides only one estimate per parameter. 

3. It is easy to apply because all one needs to know is the total number of exogenous or pre¬ 
determined variables in the system without knowing any other variables in the system. 

4. Although specially designed to handle overidentified equations, the method can also 
be applied to exactly identified equations. But then ILS and 2SLS will give identical 
estimates. (Why?) 

5. If the R 2 values in the reduced-form regressions (that is, Stage 1 regressions) are very 
high, say, in excess of 0.8, the classical OLS estimates and 2SLS estimates will be very 
close. But this result should not be surprising because if the R 2 value in the first stage 
is very high, it means that the estimated values of the endogenous variables are very 
close to their actual values, and hence the latter are less likely to be correlated with 
the stochastic disturbances in the original structural equations. (Why?) 15 If, however, the 

15 ln the extreme case of R 2 = 1 in the first-stage regression, the endogenous explanatory variable in 
the original (overidentified) equation will be practically nonstochastic (why?). 
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R 2 values in the first-stage regressions are very low, the 2SLS estimates will be practi¬ 
cally meaningless because we shall be replacing the original Fs in the second-stage re¬ 
gressions by the estimated F’s from the first-stage regressions, which will essentially 
represent the disturbances in the first-stage regressions. In other words, in this case, the 
F’s will be very poor proxies for the original F’s. 

6. Notice that in reporting the ILS regression in Eq. (20.3.15) we did not state the standard 
errors of the estimated coefficients (for reasons explained in footnote 10). But we can do 
this for the 2SLS estimates because the structural coefficients are directly estimated 
from the second-stage (OLS) regressions. There is, however, a caution to be exercised. 
The estimated standard errors in the second-stage regressions need to be modified 
because, as can be seen from Eq. (20.4.6), the error term u* is, in fact, the original error 
term w 2f plus foA- Hence, the variance of u* is not exactly equal to the variance of the 
original u 2 t. However, the modification required can be easily effected by the formula 
given in Appendix 20A, Section 20A.2. 

7. In using the 2SLS, bear in mind the following remarks of Henri Theil: 

The statistical justification of the 2SLS is of the large-sample type. When there are no lagged 
endogenous variables,... the 2SLS coefficient estimators are consistent if the exogenous 
variables are constant in repeated samples and if the disturbance^] [appearing in the various 
behavioral or structural equations] ... are independently and identically distributed with zero 
means and finite variances.... If these two conditions are satisfied, the sampling distribution 
of 2SLS coefficient estimators becomes approximately normal for large samples.. . . 

When the equation system contains lagged endogenous variables, the consistency and 
large-sample normality of the 2SLS coefficient estimators require an additional condition,. .. 
that as the sample increases the mean square of the values taken by each lagged endogenous 
variable converges in probability to a positive limit. . .. 

If [the disturbances appearing in the various structural equations are] not independently 
distributed, lagged endogenous variables are not independent of the current operation of the 
equation system.... which means these variables are not really predetermined. If these 
variables are nevertheless treated as predetermined in the 2SLS procedure, the resulting 
estimators are not consistent. 16 


20.5 2SLS: A Numerical Example 

To illustrate the 2SLS method, consider the income-money supply model given previously 
in Eqs. (20.4.1) and (20.4.2). As shown, the money supply equation is overidentified. To 
estimate the parameters of this equation, we resort to the two-stage least-squares method. 
The data required for analysis are given in Table 20.2; this table also gives some data that 
are required to answer some of the questions given in the exercises. 

Stage 1 Regression 

We first regress the stochastic explanatory variable income Y\, represented by GDP, on the 
predetermined variables private investment X\ and government expenditure X 2 , obtaining 
the following results: 

Yu= 2689.848 + 1.8700X 1( + 2.0343X 2 , 

se = (67.9874) (0.1717) (0.1075) (20.5.1) 

t= (39.5639) (10.8938) (18.9295) R 2 = 0.9964 


16 Henri Theil, Introduction to Econometrics, Prentice Hall, Englewood Cliffs, NJ, 1978, pp. 341-342. 
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Observation 

GDP (K0 

M2 (T 2 ) 

GPDI (Xi) 

FEDEXP (X 2 ) 

TB6 (X 3 ) 

1970 

3,771.9 

626.5 

427.1 

201.1 

6.562 

1971 

3,898.6 

710.3 

475.7 

220.0 

4.511 

1972 

4,105.0 

802.3 

532.1 

244.4 

4.466 

1973 

4,341.5 

855.5 

594.4 

261.7 

7.178 

1974 

4,319.6 

902.1 

550.6 

293.3 

7.926 

1975 

4,311.2 

1,016.2 

453.1 

346.2 

6.122 

1976 

4,540.9 

1,152.0 

544.7 

374.3 

5.266 

1977 

4,750.5 

1,270.3 

627.0 

407.5 

5.510 

1978 

5,015.0 

1,366.0 

702.6 

450.0 

7.572 

1979 

5,173.4 

1,473.7 

725.0 

497.5 

10.017 

1980 

5,161.7 

1,599.8 

645.3 

585.7 

11.374 

1981 

5,291.7 

1,755.4 

704.9 

672.7 

13.776 

1982 

5,189.3 

1,910.3 

606.0 

748.5 

11.084 

1983 

5,423.8 

2,126.5 

662.5 

815.4 

8.75 

1984 

5,813.6 

2,310.0 

857.7 

877.1 

9.80 

1985 

6,053.7 

2,495.7 

849.7 

948.2 

7.66 

1986 

6,263.6 

2,732.4 

843.9 

1,006.0 

6.03 

1987 

6,475.1 

2,831.4 

870.0 

1,041.6 

6.05 

1988 

6,742.7 

2,994.5 

890.5 

1,092.7 

6.92 

1989 

6,981.4 

3,158.5 

926.2 

1,167.5 

8.04 

1990 

7,112.5 

3,278.6 

895.1 

1,253.5 

7.47 

1991 

7,100.5 

3,379.1 

822.2 

1,315.0 

5.49 

1992 

7,336.6 

3,432.5 

889.0 

1,444.6 

3.57 

1993 

7,532.7 

3,484.0 

968.3 

1,496.0 

3.14 

1994 

7,835.5 

3,497.5 

1,099.6 

1,533.1 

4.66 

1995 

8,031.7 

3,640.4 

1,134.0 

1,603.5 

5.59 

1996 

8,328.9 

3,815.1 

1,234.3 

1,665.8 

5.09 

1997 

8,703.5 

4,031.6 

1,387.7 

1,708.9 

5.18 

1998 

9,066.9 

4,379.0 

1,524.1 

1,734.9 

4.85 

1999 

9,470.3 

4,641.1 

1,642.6 

1,787.6 

4.76 

2000 

9,817.0 

4,920.9 

1,735.5 

1,864.4 

5.92 

2001 

9,890.7 

5,430.3 

1,598.4 

1,969.5 

3.39 

2002 

10,048.8 

5,774.1 

1,557.1 

2,101.1 

1.69 

2003 

10,301.0 

6,062.0 

1,613.1 

2,252.1 

1.06 

2004 

10,703.5 

6,411.7 

1,770.6 

2,383.0 

1.58 

2005 

11,048.6 

6,669.4 

1,866.3 

2,555.9 

3.40 


No 


■ GDP = gross domestic product (billions of chained 2000 dollars). 

Y 2 = M2 = M2 money supply (billions of dollars). 

X\ = GPDI = gross private domestic investment (billions of chained 2000 dollars). 
X 2 = FEDEXP = Federal government expenditure (billions of dollars). 

X 3 = TB6 = 6-month Treasury bill rate (%). 


Stage 2 Regression 

We now estimate the money supply function (20.4.2), replacing the endogenous variable Y\ 
by Y\ estimated from Eq. (20.5.1) (= Y\). The results are as follows: 

Y 2 , = - 2440.180 + 0.79207 u 

se = (127.3720) (0.0178) (20.5.2) 


t= (-19.1579) (44.5246) 


= 0.9831 
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As we pointed out previously, the estimated standard errors given in Eq. (20.5.2) need to 
be corrected in the manner suggested in Appendix 20.A, Section 20A.2. Effecting 
this correction (most econometric packages can do it now), we obtain the following 
results: 

Yu = —2440.180 + 0.79207i, 

se = (126.9598) (0.0212) (20.5.3) 

t= (-17.3149) (37.3057) R 2 = 0.9803 

As noted in Appendix 20A, Section 20A.2, the standard errors given in Eq. (20.5.3) do 
not differ much from those given in Eq. (20.5.2) because the R 2 in Stage 1 regression is 
very high. 

OLS Regression 

For comparison, we give the regression of money stock on income as shown in Eq. (20.4.2) 
without “purging” the stochastic Y\ t of the influence of the stochastic disturbance term. 

Y lt = - 2195.468 + 0.791 \Y U 

se = (126.6460) (0.0211) (20.5.4) 

t= (-17.3354) (37.3812) R 2 = 0.9803 

Comparing the “inappropriate” OLS results with the Stage 2 regression, we see that the 
two regressions are virtually the same. Does this mean that the 2SLS procedure is worth¬ 
less? Not at all. That in the present situation the two results are practically identical should 
not be surprising because, as noted previously, the R 2 value in the first stage is very high, 
thus making the estimated Y\ t virtually identical with the actual Y\ t . Therefore, in this case 
the OLS and second-stage regressions will be more or less similar. But there is no guaran¬ 
tee that this will happen in every application. An implication, then, is that in overidentified 
equations one should not accept the classical OLS procedure without checking the second- 
stage regression(s). 

Simultaneity between GDP and Money Supply 

Let us find out if GDP (Fi) and money supply (F2) are mutually dependent. For this purpose 
we use the Hausman test of simultaneity discussed in Chapter 19. 

First we regress GDP on X\ (investment expenditure) and A) (government expenditure), 
the exogenous variables in the system (i.e., we estimate the reduced-form regression). From 
this regression we obtain the estimated GDP and the residuals v t , as suggested in 
Eq. (19.4.7). Then we regress money supply on estimated GDP and v t to obtain the follow¬ 
ing results: 

Y 2t = -2198.297 + 0.7915F U + 0.6984v, 

se m (129.0548) (0.0215) (0.2970) (20.5.5) 

t= (-17.0338) (36.70016) (2.3511) 

Since the t value of v t is statistically significant (the p value is 0.0263), we cannot reject the 
hypothesis of simultaneity between money supply and GDP, which should not be surpris¬ 
ing. {Note: Strictly speaking, this conclusion is valid only in large samples; technically, it 
is only valid as the sample size increases indefinitely.) 
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Hypothesis Testing 

Suppose we want to test the hypothesis that income has no effect on money demand. Can 
we test this hypothesis with the usual t test from the estimated regression (20.5.2)? Yes, 
provided the sample is large and provided we correct the standard errors as shown in 
Eq. (20.5.3), we can use the t test to test the significance of an individual coefficient and the 
F test to test joint significance of two or more coefficients, using formula (8.4.7). 17 

What happens if the error term in a structural equation is autocorrelated and/or corre¬ 
lated with the error term in another structural equation in the system? A full answer to this 
question will take us beyond the scope of the book and is better left for the references (see 
the reference given in footnote 7). Nevertheless, estimation techniques (such as Zellner’s 
SURE technique) do exist to handle these complications. 

To conclude the discussion of our numerical example, it may be added that the various 
steps involved in the application of 2SLS are now routinely handled by software packages 
such as STATA and EViews. ft was only for pedagogical reason we showed the details of 
2SLS. See Exercise 20.15. 

20.6 Illustrative Examples 

In this section we consider some applications of the simultaneous-equation methods. 


EXAMPLE 20.1 

Advertising, 
Concentration, 
and Price 
Margins 


To study the interrelationships among advertising, concentration (as measured by the 
concentration ratio), and price-cost margins, Allyn D. Strickland and Leonard W. Weiss 
formulated the following three-equation model. 18 

Advertising intensity function: 

Ad/S = a 0 + aiM + a 2 (CD/S) + a^C+ a 4 C 2 + o 5 Gr+ o 6 Dur ( 20 . 6 . 1 ) 
Concentration function: 


C= b 0 + hi (Ad/S) + h 2 (MES/S) ( 20 . 6 . 2 ) 

Price-cost margin function: 

M = c 0 + Ci(K/S) + c 2 Gr + c 3 C + c 4 GD + c s (Ad/5) + c 6 (MES/S) ( 20 . 6 . 3 ) 


where Ad = advertising expense 
S = value of shipments 
C = four-firm concentration ratio 
CD = consumer demand 
MES = minimum efficient scale 
M = price/cost margin 

Gr = annual rate of growth of industrial production 
Dur = dummy variable for durable goods industry 
K = capital stock 

GD = measure of geographic dispersion of output 


17 But take this precaution: The restricted and unrestricted RSS in the numerator must be calculated 
using predicted Y (as in Stage 2 of 2SLS) and the RSS in the denominator is calculated using actual 
rather than predicted values of the regressors. For an accessible discussion of this point, see T. Dudley 
Wallace and J. Lew Silver, Econometrics: An Introduction, Addison-Wesley, Reading, Mass., 1988, 

Sec. 8.5. 

18 See their "Advertising, Concentration, and Price-Cost Margins," journal of Political Economy, vol. 84, 
no. 5, 1976, pp. 1109-1121. 
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EXAMPLE 20.1 By the order conditions for identifiability, Eq. (20.6.2) is overidentified, whereas 
(Continued) E q s - C 20 - 6 - 1 ) and (20.6.3) are exactly identified. 

The data for the analysis came largely from the 1963 Census of Manufacturers and 
covered 408 of the 417 four-digit manufacturing industries. The three equations were first 
estimated by OLS, yielding the results shown in Table 20.3. To correct for the simultaneous- 
equation bias, the authors reestimated the model using 2SLS. The ensuing results are 
given in Table 20.4. We leave it to the reader to compare the two results. 


TABLE 20.3 
OLS Estimates of 
Three Equations 
(t ratios in 
parentheses) 




Dependent Variable 



Ad/5 

C 

M 


Eq. (20.6.1) 

Eq. (20.6.2) 

Eq. (20.6.3) 

Constant 

-0.0314 (-7.45) 

0.2638 (25.93) 

0.1682 (1 7.15) 

C 

0.0554 (3.56) 

— 

0.0629 (2.89) 

c 2 

-0.0568 (-3.38) 

— 

— 

M 

0.1123 (9.84) 

— 

— 

CD /S 

0.0257 (8.94) 

— 

— 

Cr 

0.0387(1.64) 


0.2255 (2.61) 

Dur 

-0.0021 (-1.11) 

— 

— 

Ad /S 

— 

1.1613 (3.3) 

1.6536 (11.00) 

MES IS 

— 

4.1852 (18.99) 

0.0686 (0.54) 

K/S 

— 

— 

0.1123 (8.03) 

CD 

— 

— 

-0.0003 (-2.90) 

R 2 

0.374 

0.485 

0.402 

df 

401 

405 

401 


TABLE 20.4 

Two-Stage Least- 
Squares Estimates 
of Three Equations 
(t ratios in 
parentheses) 


Dependent Variable 



Ad/5 

C 

M 


Eq. (20.6.1) 

Eq. (20.6.2) 

Eq. (20.6.3) 

Constant 

-0.0245 (-3.86) 

0.2591 (21.30) 

0.1736 (14.66) 

C 

0.0737 (2.84) 

— 

0.0377 (0.93) 

C 2 

-0.0643 (-2.64) 

— 

— 

M 

0.0544 (2.01) 

— 

— 

CD/S 

0.0269 (8.96) 

— 

— 

Gr 

0.0539 (2.09) 

— 

0.2336 (2.61) 

Dur 

-0.0018 (-0.93) 

— 

— 

Ad/5 

— 

1.5347 (2.42) 

1.6256 (5.52) 

MES/5 

— 

4.169 (18.84) 

0.1720 (0.92) 

K/S 

— 

— 

0.1165 (7.30) 

CD 

- 

- 

-0.0003 (-2.79) 


EXAMPLE 20.2 In Example 18.6 we discussed briefly the pioneering model of Klein. Initially, the model 
Klein’s Model I was estimated for the period 1920-1941. The underlying data are given in Table 20.5; and 
OLS, reduced-form, and 2SLS estimates are given in Table 20.6. We leave it to the reader 
to interpret these results. 

( Continued) 










EXAMPLE 20.2 

(Continued) 


TABLE 20.6* 

OLS, Reduced- 
Form and 2SLS 
Estimates of Klein’s 
Model I 

Source: G. S. Maddala, 

New York, 1977, p. 242. 


TABLE 20.5 Underlying Data for Klein’s Model I 


Year C* 

P 

W 

/ 

K-i 

X 

w 

C 

7 

1920 39.8 

12.7 

28.8 

2.7 

180.1 

44.9 

2.2 

2.4 

3.4 

1921 41.9 

12.4 

25.5 

-0.2 

182.8 

45.6 

2.7 

3.9 

7.7 

1922 45.0 

16.9 

29.3 

1.9 

182.6 

50.1 

2.9 

3.2 

3.9 

1923 49.2 

18.4 

34.1 

5.2 

184.5 

57.2 

2.9 

2.8 

4.7 

1924 50.6 

19.4 

33.9 

3.0 

189.7 

57.1 

3.1 

3.5 

3.8 

1925 52.6 

20.1 

35.4 

5.1 

192.7 

61.0 

3.2 

3.3 

5.5 

1926 55.1 

19.6 

37.4 

5.6 

197.8 

64.0 

3.3 

3.3 

7.0 

1927 56.2 

19.8 

37.9 

4.2 

203.4 

64.4 

3.6 

4.0 

6.7 

1928 57.3 

21.1 

39.2 

3.0 

207.6 

64.5 

3.7 

4.2 

4.2 

1929 57.8 

21.7 

41.3 

5.1 

210.6 

67.0 

4.0 

4.1 

4.0 

1930 55.0 

15.6 

37.9 

1.0 

215.7 

61.2 

4.2 

5.2 

7.7 

1931 50.9 

11.4 

34.5 

-3.4 

216.7 

53.4 

4.8 

5.9 

7.5 

1932 45.6 

7.0 

29.0 

-6.2 

213.3 

44.3 

5.3 

4.9 

8.3 

1933 46.5 

11.2 

28.5 

-5.1 

207.1 

45.1 

5.6 

3.7 

5.4 

1934 48.7 

12.3 

30.6 

-3.0 

202.0 

49.7 

6.0 

4.0 

6.8 

1935 51.3 

14.0 

33.2 

-1.3 

199.0 

54.4 

6.1 

4.4 

7.2 

1936 57.7 

17.6 

36.8 

2.1 

197.7 

62.7 

7.4 

2.9 

8.3 

1937 58.7 

17.3 

41.0 

2.0 

199.8 

65.0 

6.7 

4.3 

6.7 

1938 57.5 

15.3 

38.2 

-1.9 

201.8 

60.9 

7.7 

5.3 

7.4 

1939 61.6 

19.0 

41.6 

1.3 

199.9 

69.5 

7.8 

6.6 

8.9 

1940 65.0 

21.1 

45.0 

3.3 

201.2 

75.7 

8.0 

7.4 

9.6 

1941 69.7 

23.5 

53.3 

4.9 

204.5 

88.4 

8.5 

13.8 

11.6 

♦Interpretation of column heads is listed in Example 18.6. 






Source: These data are taken from G. S 

. Maddala, Econometrics, McGraw-Hill, New York, 1977, p. 238. 



OLS: 









C = 16.237 

4- 

+ 

o 

+ 

+ 

© 

o 

vg 

R 

2 = 0.978 

DW 

= 1.367 

(1.203) 

(0.091) 

(0.040) 


(0.090) 





f =10.125 

+ 0.479P+ 0.333P 

i - 0.112X_i 

R 

2 =0.919 

DW 

= 1.810 

(5.465) 

(0.097) 

(0.100) 

(0.026) 





W= 0.064 

+ 0.439X+ 0.146X- 

i 4- 0.130t 

R 

2 = 0.985 

DW 

= 1.958 

(1.151) 

(0.032) 

(0.037) 

(0.031) 





Reduced-form: 








P = 46.383 + 0.81 3P_t — 0.213K_ 

i 4- 0.015X_ 

i 4 0.297t- 0.92674- 0.443C 

(10.870) (0.444) (0.067) 

(0.252) 

(0.154) (0.385) 

(0.373) 







P =0.753 DW= 1.854 

IV + W' =40.278 + 0.823P i - 0.144 

i 4- 0.115X_ 

, 4- 0.881 f- 0.56774- 

0.859C 

(8.787) (0.359) (0.054) 

(0.204) 

(0.124) (0.311) 

(0.302) 







R = 0.949 

DW 

' = 2.395 

X = 78.281 + 1.724P ,- 0.319^ 

4- 0.094X 

, 4- 0.8781- 0.56574- 

1.317G 

(18.860) (0.771) (0.110) 

(0.438) 

(0.267) (0.669) 

(0.648) 







R =0.882 

DW 

= 2.049 

2SLS: 









C =16.543 

4- 0.019P+ 0.810(IV+IV')+ 0.214P_! 


R =0.9726 


(1.464) 

(0.1 30) 

(0.044) 


(0.118) 





? = 20.284 + 0.149 P+ 0.616 P 

,- 0.157X_i 


R =0.8643 


(8.361) 

(0.191) 

(0.180) 

(0.040) 





W = 0.065 

4- 0.438X4- 0.146X_ 

, 4- 0.1 30t 


R =0.9852 


(1.894) 

(0.065) 

(0.070) 

(0.053) 






*Interpretation of variables is listed in Example 18.6 (standard errors in parentheses). 
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EXAMPLE 20.3 

The Capital Asset 
Pricing Model 
Expressed as a 
Recursive System 


In a rather unusual application of recursive simultaneous-equation modeling, Cheng F. Lee 
and W. P. Lloyd 19 estimated the following model for the oil industry: 

f?i t = <*i + YiMt+Uit 

Rit = a.2 + fii t + YiMt + U2t 


/?3f = a 3 + $ 31 /?! t + ^3 2 /?2t 


+ Y3 M t + W3t 


/?4 t = Q!4 + ^41 fil t + ^42^21 + + YA^t + U4 t 

fist = 0‘s + fts: fil t + Ps2^2t + /S53fi3t + ft4fi4t + KsH + WSt 

fi6t = «6 + ^61 fil t + ^62^21 + /3 63 fi 3f + ft64^41 + ftesRst + Y6^t + U6t 

R71 = 017+ ftr\ fil t + /^72fi 2 f + ftllR-V + ft74%4t + @75^5t + ft76^61 + Y7^t + U7t 


where fii = rate of return on security 1 (= Imperial Oil) 
fi 2 = rate of return on security 2 (= Sun Oil) 


/?7 = rate of return on security 7 (= Standard of Indiana) 

M t = rate of return on the market index 
Ujt = disturbances (/ = 1, 2,..., 7) 

Before we present the results, the obvious question is: How do we choose which is 
security 1, which is security 2, and so on? Lee and Lloyd answer this question purely 
empirically. They regress the rate of return on security / on the rates of return of the 
remaining six securities and observe the resulting R 2 . Thus, there will be seven such 
regressions. Then they order the estimated fi 2 values, from the lowest to the highest. The 
security having the lowest fi 2 is designated as security 1 and the one having the highest 
R 2 is designated as security 7. The idea behind this is intuitively simple. If the fi 2 of the 
rate of return of, say, Imperial Oil, is lowest with respect to the other six securities, it 
would suggest that this security is affected least by the movements in the returns of 
the other securities. Therefore, the causal ordering, if any, runs from this security to the 
others and there is no feedback from the other securities. 

Although one may object to such a purely empirical approach to causal ordering, let us 
present their empirical results nonetheless, which are given in Table 20.7. 

In Exercise 5.5 we introduced the characteristic line of modern investment theory, 
which is simply the regression of the rate of return on security i on the market rate of 
return. The slope coefficient, known as the beta coefficient, is a measure of the volatility 
of the security's return. What the Lee-Lloyd regression results suggest is that there are 
significant intra-industry relationships between security returns, apart from the common 
market influence represented by the market portfolio. Thus, Standard of Indiana's return 
depends not only on the market rate of return but also on the rates of return on Shell Oil, 
Phillips Petroleum, and Union Oil. To put the matter differently, the movement in the 
rate of return on Standard of Indiana can be better explained if in addition to the mar¬ 
ket rate of return we also consider the rates of return experienced by Shell Oil, Phillips 
Petroleum, and Union Oil. 


( Continued ) 


19 "The Capital Asset Pricing Model Expressed as a Recursive System: An Empirical Investigation,' 
journal of Financial and Quantitative Analysis, June 1976, pp. 237-249. 
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EXAMPLE 20.3 TABLE 20.7 Recursive System Estimates for the Oil Industry 

(Continued) Linear Form 


Dependent Variables 



Standard 

Shell 

Phillips 

Union 

Standard 

Sun 

Imperial 


of Indiana 

Oil 

Petroleum 

Oil 

of Ohio 

Oil 

Oil 

Standard 
of Indiana 
Shell Oil 

0.2100* 

(2.859) 







Phillips 

0.2293* 

0.0791 






Petroleum 

(2.1 76) 

(1.065) 






Union Oil 

0.1 754* 

0.2171* 

0.2225* 






(2.472) 

(3.177) 

(2.337) 





Standard 

-0.0794 

0.0147 

0.4248* 

0.1468* 




of Ohio 

(-1.294) 

(0.235) 

(5.501) 

(1.735) 




Sun Oil 

0.1249 

0.1710* 

0.0472 

0.1339 

0.0499 




(1.343) 

(1.843) 

(0.355) 

(0.908) 

(0.271) 



Imperial Oil 

-0.1077 

0.0526 

0.0354 

0.1580 

-0.2541* 

0.0828 



(-1.412) 

(0.6804) 

(0.319) 

(1.290) 

(-1.691) 

(0.971) 


Constant 

0.0868 

-0.0384 

-0.0127 

-0.2034 

0.3009 

0.2013 

0.3710* 


(0.681) 

(1.296) 

(-0.068) 

(0.986) 

(1.204) 

(1.399) 

(2.161) 

Market index 

0.3681* 

0.4997* 

0.2884 

0.7609* 

0.9089* 

0.7161* 

0.6432* 


(2.165) 

(3.039) 

(1.232) 

(3.069) 

(3.094) 

(4.783) 

(3.774) 

R 2 

0.5020 

0.4658 

0.4106 

0.2532 

0.0985 

0.2404 

0.1247 

Durbin- 

2.1083 

2.4714 

2.2306 

2.3468 

2.2181 

2.3109 

1.9592 


Watson 


•Denotes significance at 0.10 level or better for two-tailed test. 


EXAMPLE 20.4 

Revised Form of 
St. Louis Model 20 


The well-known, and often controversial, St. Louis model originally developed in the late 
1960s has been revised from time to time. One such revision is given in Table 20.8, and 
the empirical results based on this revised model are given in Table 20.9. (Note: A dot over 
a variable means the growth rate of that variable.) The model basically consists of Eqs. (1), 
(2), (4), and (5) in Table 20.8, the other equations representing the definitions. Equa¬ 
tion (1) was estimated by OLS. Equations (1), (2), and (4) were estimated using the Almon 
distributed-lag method with (endpoint) constraints on the coefficients. Where relevant, 
the equations were corrected for first-order (pi) and/or second-order (p2) serial 
correlation. 

Examining the results, we observe that it is the rate of growth in the money supply that 
primarily determines the rate of growth of (nominal) GNP and not the rate of growth 
in high-employment expenditures. The sum of the M coefficients is 1.06, suggesting 
that a 1 percent (sustained) increase in the money supply on the average leads to about 
1.06 percent increase in the nominal GNP. On the other hand, the sum of the E coeffi¬ 
cients, about 0.05, suggests that a change in high-employment government expenditure 
has little impact on the rate of growth of nominal GNP. It is left to the reader to interpret 
the results of the other regressions reported in Table 20.9. 


“Federal Reserve Bank of St. Louis, Review, May 1982, p. 14. 







EXAMPLE 20.4 

(Continued) 


TABLE 20.9 
In-Sample 
Estimation: 1960-1 
to 1980-IV 
(absolute value of 
t statistic 
in parentheses) 

Source: Federal Reserve 
Bank of St. Louis, Review, 
May 1982, p. 14. 
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TABLE 20.8 The St. Louis Model 

0) Ti = Cl +£cM,(M t _,)+ ECf (E t -/) + el t 

/=o /=o 

(2) P t = C2 + £ CPE,(PEf_/) + £ C D/(X(_/ —XFf_n) 

i=1 /=o 

+ CPA(PA,) + CDUMI(DUMI) + CDUM2(DUM2) + s2 t 
21 

(3) PA, = £ CPRL,(P t _,) 

i=i 20 

(4) RL, = C3 + CPRL,(P,_,) + e3t 

/=o 

(5) U t - UFf = CC(GAP t ) + CGI (GAP,_,) + e4, 

(6) n = (P t /100)(X t ) 

(7) n = [(n/y t -,) 4 -i]ioo 

(8) X, = [(X,/X,_;) 4 - 1]100 

(9) P t = [(P t /P t _,) 4 -1]100 

(10) GAP, = [(XF,/Xt)/XF,]100 

(11) XF*= [(XF,/X,_i) 4 — 1]100 


M= money stock (Ml) 

E = high employment expenditures 
P = GNP deflator (1972 = 100) 

PE = relative price of energy 



(1) %m 2.44 + 0.40/W,+ 0.39M,_ 1 + 0.22M,_ 2 + 0.06M,^ 3 - 0.01 M,_ 4 

(2.15) (3.38) (5.06) (2.18) (0.82) (0.11) 

+ 0 . 06 E t + 0 . 02 f*t_i - 0 . 02 f ,_2 — 0 . 02£,_ 3 + 0.01 £,_ 4 
(1.46) (0.63) (0.57) (0.52) (0.34) 

R 2 = 0.39 se = 3.50 DW = 2.02 

(2) = 0.96 + 0.01 PE f _! + 0.04PE,_2 - 0.01 PE,_ 3 + 0.02PE, 4 

(2.53) (0.75) (1.96) (0.73) (1.38) 

- 0.00(X t -XF?) + O.OHXf-i-XF^) + 0.02(X,_2-XF?_ 2 ) 

(0.18) (1.43) (4.63) 

+ 0.02(X,_ 3 — XF?_ 3 )+ 0.02(Xj_ 4 — XFJU + 0.01(X,_ 5 - XF?_ S ) 

(3.00) (2.42) (2.16) 

+ 1.03(PA t ) — 0.61 (DUM1,) + 1.65(DUM2,) 

(10.49) (1.02) (2.71) 

R 2 = 0.80 se = 1.28 DW = 1.97 p = 0.12 

(4) RL,= 2.97 + 0.96^ P,_, 

(3.12) (5.22) = ° 

R 2 = 0.32 se = 0.33 DW=1.76 ,3 = 0.94 

(5) U^TTf, = 0.28(GAP,)+ 0.14(GAP 

(11.89) (6.31) 

R 2 — 0.63 se = 0.1 7 DW=1.95 


ft = 1.43 p 2 = 0.52 
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Summary and 
Conclusions 


EXERCISES 


1. Assuming that an equation in a simultaneous-equation model is identified (either 
exactly or over-), we have several methods to estimate it. 

2. These methods fall into two broad categories: Single-equation methods and systems 
methods. 

3. For reasons of economy, specification errors, etc., the single-equation methods are by far 
the most popular. A unique feature of these methods is that one can estimate a single¬ 
equation in a multiequation model without worrying too much about other equations in 
the system. {Note: For identification purposes, however, the other equations in the 
system count.) 

4. Three commonly used single-equation methods are OLS, ILS, and 2SLS. 

5. Although OLS is, in general, inappropriate in the context of simultaneous-equation 
models, it can be applied to the so-called recursive models where there is a definite but 
unidirectional cause-and-effect relationship among the endogenous variables. 

6. The method of ILS is suited for just or exactly identified equations. In this method OLS 
is applied to the reduced-form equation, and it is from the reduced-form coefficients that 
one estimates the original structural coefficients. 

7. The method of 2SLS is especially designed for overidentified equations, although it can 
also be applied to exactly identified equations. But then the results of 2SLS and ILS are 
identical. The basic idea behind 2SLS is to replace the (stochastic) endogenous ex¬ 
planatory variable by a linear combination of the predetermined variables in the model 
and use this combination as the explanatory variable in lieu of the original endogenous 
variable. The 2SLS method thus resembles the instrumental variable method of 
estimation in that the linear combination of the predetermined variables serves as an 
instrument, or proxy, for the endogenous regressor. 

8. A noteworthy feature of both ILS and 2SLS is that the estimates obtained are consistent, 
that is, as the sample size increases indefinitely, the estimates converge to their true 
population values. The estimates may not satisfy small-sample properties, such as unbi¬ 
asedness and minimum variance. Therefore, the results obtained by applying these 
methods to small samples and the inferences drawn from them should be interpreted 
with due caution. 


Questions 

20.1. State whether each of the following statements is true or false: 

a. The method of OLS is not applicable to estimate a structural equation in a 
simultaneous-equation model. 

h. In case an equation is not identified, 2SLS is not applicable. 

c. The problem of simultaneity does not arise in a recursive simultaneous-equation 
model. 

d. The problems of simultaneity and exogeneity mean the same thing. 

e. The 2SLS and other methods of estimating structural equations have desirable 
statistical properties only in large samples. 

f. There is no such thing as an R 2 for the simultaneous-equation model as a whole. 
*g. The 2SLS and other methods of estimating structural equations are not applicable 

if the equation errors are autocorrelated and/or are correlated across equations. 
h. If an equation is exactly identified, ILS and 2SLS give identical results. 
‘Optional. 
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20.2. Why is it unnecessary to apply the two-stage least-squares method to exactly iden¬ 
tified equations? 

20.3. Consider the following modified Keynesian model of income determination: 

C t = p l0 + PnY t + u u 
h = P20 + Pi\Y t + finYt-i + uit 
Y, = C, + I, + G t 

where C = consumption expenditure 
/ = investment expenditure 
Y — income 

G — government expenditure 
G, and 7,_ 1 are assumed predetermined 

a. Obtain the reduced-form equations and determine which of the preceding 
equations are identified (either just or over-). 

b. Which method will you use to estimate the parameters of the overidentified 
equation and of the exactly identified equation? Justify your answer. 

20.4. Consider the following results:* 

OLS : W t = 0.276 + 0.258A + 0.046P f -i + 4.959 V, R 2 = 0.924 

OLS: P, = 2.693 + 0.232W, - 0.544X, + 0.247M, + 0.064M r _i R 2 - 0.982 

2 SLS: W, = 0.272 + 0.257A + 0.046P,-i + 4.966 V t R 2 = 0.920 

2 SLS\ P, = 2.686 + 0.233 W, - 0.544X r + 0.246 M t + 0.046M r _i R 2 = 0.981 

where W t , P t , M t , and X, are percentage changes in earnings, prices, import 
prices, and labor productivity (all percentage changes are over the previous year), 
respectively, and where V t represents unfilled job vacancies (percentage of total 
number of employees). 

“Since the OLS and 2SLS results are practically identical, 2SLS is meaningless.” 
Comment. 

^O.S. Assume that production is characterized by the Cobb-Douglas production function 
Qt = AK?L f 


where Q — output 

K — capital input 
L = labor input 
A, a, and = parameters 
i m 2th firm 

Given the price of final output P, the price of labor W, and the price of capital R, 
and assuming profit maximization, we obtain the following empirical model of 
production: 

Production function: 

In Qi — In A + a In K t + /5 In Z, + In uu (1) 


'Source: Prices and Earnings in 1951-1969: An Econometric Assessment, Department of Employment, 

United Kingdom, Her Majesty's Stationery Office, London, 1971, p. 30. 

tQptional. 
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Marginal product of labor function: 

W 

In Qt — — In ft + In Li + In — + In u 2i (2) 

Marginal product of capital function: 

In Qi — — In a + \nK t + In + ln« 3; (3) 

where u\, u 2 , and u 2 are stochastic disturbances. 

In the preceding model there are three equations in three endogenous variables 
Q, L, and K. P, R, and W are exogenous. 

a. What problems do you encounter in estimating the model if a + p = 1, that is, 
when there are constant returns to scale? 

b. Even if a + p f 1, can you estimate the equations? Answer by considering the 
identifiability of the system. 

c. If the system is not identified, what can be done to make it identifiable? 

Note: Equations (2) and (3) are obtained by differentiating Q with respect to labor 
and capital, respectively, setting them equal to W/P and R/P, transforming the 
resulting expressions into logarithms, and adding (the logarithm of) the disturbance 
terms. 

20.6. Consider the following demand-and-supply model for money: 

Demand for money: M 'f = Po + P\Y\ + fi 2 R, + p 2 P t + u\ t 
Supply of money: M\ = oto + oi\ Y t + u 2 t 

where M — money 
Y — income 
R — rate of interest 
P — price 

Assume that R and P are predetermined. 

a. Is the demand function identified? 

b. Is the supply function identified? 

c. Which method would you use to estimate the parameters of the identified 
equation(s)? Why? 

d. Suppose we modify the supply function by adding the explanatory variables Y t -\ 
and M t _\. What happens to the identification problem? Would you still use the 
method you used in (c)? Why or why not? 

20.7. Refer to Exercise 18.10. For the two-equation system there obtain the reduced-form 
equations and estimate their parameters. Estimate the indirect least-squares regres¬ 
sion of consumption on income and compare your results with the OLS regression. 

Empirical Exercises 

20.8. Consider the following model: 

Rt = P 0 + Pi M t + p 2 Y t + u\ t 
Y, =a 0 + a\R t + u 2t 

where M t (money supply) is exogenous, R t is the interest rate, and Y, is GDR 

a. How would you justify the model? 

b. Are the equations identified? 

c. Using the data given in Table 20.2, estimate the parameters of the identified 
equations. Justify the method(s) you use. 
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20.9. Suppose we change the model in Exercise 20.8 as follows: 

Rt = P o + P\M t + p 2 Y, + PsYt-i + uu 
Y t = oto + a.\R t + U2t 

a. Find out if the system is identified. 

b. Using the data given in Table 20.2, estimate the parameters of the identified 
equation(s). 

20.10. Consider the following model: 

Rt = Po + P\M t + P2Y t +uu 
Y, — a 0 + ot\R, + a 2 It + u 2t 

where the variables are as defined in Exercise 20.8. Treating / (domestic invest¬ 
ment) and M exogenously, determine the identification of the system. Using the 
data given in Table 20.2, estimate the parameters of the identified equation(s). 

20.11. Suppose we change the model of Exercise 20.10 as follows: 

Rt — Po + P\M t + p 2 Y t + u\ t 
Y t = a o + ot\R t + a 2 It + «2 1 
it = Yo + yiRt + «3 1 

Assume that M is determined exogenously. 

a. Find out which of the equations are identified. 

b. Estimate the parameters of the identified equation(s) using the data given in 
Table 20.2. Justify your method(s). 

20.12. Verify the standard errors reported in Eq. (20.5.3). 

20.13. Return to the demand-and-supply model given in Eqs. (20.3.1) and (20.3.2). 
Suppose the supply function is altered as follows: 

Qt = Po + P\Pt-\ + u 2 t 

where P,-\ is the price prevailing in the previous period. 

a. If X (expenditure) and P,_i are predetermined, is there a simultaneity problem? 

b. If there is, are the demand and supply functions each identified? If they are, obtain 
their reduced-form equations and estimate them from the data given in Table 20.1. 

c. From the reduced-form coefficients, can you derive the structural coefficients? 
Show the necessary computations. 

20.14. Class Exercise: Consider the following simple macroeconomic model for the U.S. 
economy, say, for the period 1960-1999.* 

Private consumption function: 

C t — ao + ofi Y t + U2C t ~ i + u\ t ofi > 0, 0 < ot 2 < 1 
Private gross investment function: 

It = Po + P\Yt + p 2 Rt + Pif-i + uit P\ > 0, Pi < 0, 0 < p?, < 1 

A money demand function: 

R t = X o + X\Y t + XiMt-x + A3 P t + X/\R t -\ + u^t 

Aj > 0, X 2 < 0, A 3 > 0, 0 < A 4 < 1 

‘Adapted from H. R. Seddighi, K. A. Lawler, and A. V. Katos, Econometrics: A Practical Approach, 
Routledge, New York, 2000, p. 204. 
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Income identity: 


Y t = C t +I t + G t 


where C = real private consumption; I = real gross private investment, G = real 
government expenditure, Y = real GDP, M = M2 money supply at current prices, 
R — long-term interest rate (%), and P — Consumer Price Index. The endogenous 
variables are C, 7, R, and Y. The predetermined variables are: C t -\,I t -i, 
R t _ h and G, plus the intercept term. The u ’s are the error terms. 

a. Using the order condition of identification, determine which of the four equa¬ 
tions are identified, either exact or over-. 

b. Which method(s) do you use to estimate the identified equations? 

c. Obtain suitable data from government and/or private sources, estimate the 
model, and comment on your results. 

20.15. In this exercise we examine data for 534 workers obtained from the Current Popu¬ 
lation Survey (CPS) for 1985. The data can be found as Table 20.10 on the textbook 
website.* The variables in this table are defined as follows: 

W — wages $, per hour; occup = occupation; sector = 1 for manufacturing, 2 for 
construction, 0 for other; union = 1 if union member, 0 otherwise; educ = years of 
schooling; exper = work experience in years; age = age in years; sex = 1 for 
female; marital status = 1 if married; race = 1 for other, 2 for Hispanic, 3 for white; 
region = 1 if lives in the South. 

Consider the following simple wage determination model: 

In W — || + /J 2 Educ + /3 3 Exper + /LExper 2 +u t (1) 

a. Suppose education, like wages, is endogenous. How would you find out that in 
Equation (1) education is in fact endogenous? Use the data given in the table in 
your analysis. 

b. Does the Hausman test support your analysis in (a)? Explain fully. 

20.16. Class Exercise: Consider the following demand-and-supply model for loans of 
commercial hanks to businesses: 

Demand: Q d t — a\+ + a 2 RD ; + a 4 IPI f +u\ t 

Supply: Q s t = P i + St + /bRSf + /5 4 TBD* + w 2 r 

Where Q — total commercial bank loans (Sbillion); R = average prime rate; RS = 
3-month Treasury bill rate; RD = AAA corporate bond rate; IPI — Index of 
Industrial Production; and TBD = total bank deposits. 

a. Collect data on these variables for the period 1980-2007 from various sources, 
such as www.economagic.com, the website of the Federal Reserve Bank of 
St. Louis, or any other source. 

b. Are the demand and supply functions identified? List which variables are 
endogenous and which are exogenous. 

c. How would you go about estimating the demand and supply functions listed 
above? Show the necessary calculations. 

d. Why are both R and RS included in the model? What is the role of IPI in the 
model? 


'Data can be found on the Web, at http://lib.s 


j.edu/datasets/cps_85_wages. 
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Appendix 20A 


20A.1 Bias in the Indirect Least-Squares Estimators 


To show that the ILS estimators, although consistent, are biased, we 
model given in Eqs. (20.3.1) and (20.3.2). From Eq. (20.3.10) we obtain 


Pi 


As 

ftl 






ft 3 = from Eq. (20.3.7) 

fti = from Eq. (20.3.5) 


Therefore, on substitution, we obtain 


I Zqtxt 
T.Pt x t 


Using Eqs. (20.3.3) and (20.3.4), we obtain 

Pt = TIi x, + {w, - w) 
q t = n 3 x, + (v t - v) 

where w and v are the mean values of w, and v f , respectively. 
Substituting Eqs. (2) and (3) into Eq. (1), we obtain 

3 P P 7 * + ^ v > ~ 

1 n, + 'E( w t -w)x, 

= n 3 + av t -v>/Er, 2 

ni+Y,( w t-w)x t /Y, x ? 


the demand-and-supply 


( 1 ) 

( 2 ) 

(3) 


(4) 


Since the expectation operator £ is a linear operator, we cannot take the expectation of Eq. (4), 
although it is clear that fi\ / (n 3 /TIi) generally. (Why?) 

But as the sample size tends to infinity, we can obtain 


phmCSi) = Pl^n 3 +plimi:(v f -v)Vi:^ 

plim Ill + plim _ w ) x t / J2 x t 


(5) 


where use is made of the properties of plim, namely, that 

( A \ plim A 

— ) = —- 

5/ plim B 

Now as the sample size is increased indefinitely, the second term in both the denominator and the 
numerator of Eq. (5) tends to zero (why?), yielding 


plim (ft) = (6) 

showing that, although biased, Pi is a consistent estimator of /Si. 
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20A.2 Estimation of Standard Errors 
of 2SLS Estimators 


The purpose of this appendix is to show that the standard errors of the estimates obtained from the 
second-page regression of the 2SLS procedure, using the formula applicable in OLS estimation, are 
not the “proper” estimates of the “true” standard errors. To see this, we use the income-money sup¬ 
ply model given in Eqs. (20.4.1) and (20.4.2). We estimate the parameters of the overidentified money 
supply function from the second-stage regression as 

Y 2t = P20 + PnY u + u* (20.4.6) 




(7) 


Now when we run regression (20.4.6), the standard error of, say, fci is obtained from the following 
expression: 






r 2 = E(«?) 2 = E(r* - $20 - feiTi,) 2 
n-2 n - 2 


( 8 ) 

(9) 


But <7^* is not the same thing as <f 2 2 , where the latter is an unbiased estimate of the true variance 
of «2. This difference can be readily verified from Eq. (7). To obtain the true (as defined previously) 
a% 2 , we proceed as follows: 


Ult = fit ~ $20 — $21 fit 


where /§20 and $21 are the estimates from the second-stage regression. Hence, 

q 2 m TO, -$20 -to ) 2 (10) 

Note the difference between Eqs. (9) and (10): In Eq. (10) we use actual Y\ rather than the estimated 
Y\ from the first-stage regression. 

Having estimated Eq. (10), the easiest way to correct the standard errors of coefficients estimated 
in the second-stage regression is to multiply each one of them by a ui /a u p Note that if Y\ t and Y\ t are 
very close, that is, the R 2 in the first-stage regression is very high, the correction factor a U2 /a u * will 
be close to 1, in which case the estimated standard errors in the second-stage regression may be taken 
as the true estimates. But in other situations, we shall have to use the preceding correction factor. 






Chapter 


Time Series 
Econometrics: 

Some Basic Concepts 

We noted in Chapter 1 that one of the important types of data used in empirical analysis is 
time series data. In this and the following chapter we take a closer look at such data not 
only because of the frequency with which they are used in practice hut also because they 
pose several challenges to econometricians and practitioners. 

First, empirical work based on time series data assumes that the underlying time series 
is stationary. Although we have discussed the concept of stationarity intuitively in Chapter 1 , 
we discuss it more fully in this chapter. More specifically, we will try to find out what sta¬ 
tionarity means and why one should worry about it. 

Second, in Chapter 12, on autocorrelation, we discussed several causes of autocorrela¬ 
tion. Sometimes autocorrelation results because the underlying time series is nonstationary. 

Third, in regressing a time series variable on another time series variable(s), one often 
obtains a very high R 2 (in excess of 0.9) even though there is no meaningful relationship 
between the two variables. Sometimes we expect no relationship between two variables, yet 
a regression of one on the other variable often shows a significant relationship. This situa¬ 
tion exemplifies the problem of spurious, or nonsense, regression, whose nature will be 
explored shortly. It is therefore very important to find out if the relationship between eco¬ 
nomic variables is spurious or nonsensical. We will see in this chapter how spurious 
regressions can arise if time series are not stationary. 

Fourth, some financial time series, such as stock prices, exhibit what is known as the 
random walk phenomenon. This means the best prediction of the price of a stock, say 
IBM, tomorrow is equal to its price today plus a purely random shock (or error term). If this 
were in fact the case, forecasting asset prices would be a futile exercise. 

Fifth, regression models involving time series data are often used for forecasting. In 
view of the preceding discussion, we would like to know if such forecasting is valid if the 
underlying time series are not stationary. 

Finally, causality tests (recall the Granger and Sims causality tests discussed in Cha¬ 
pter 17) assume that the time series involved in analysis are stationary. Therefore, tests of 
stationarity should precede tests of causality. 

At the outset a disclaimer is in order. The topic of time series analysis is so vast and 
evolving and some of the mathematics underlying the various techniques of time series 
analysis is so involved that the best we hope to achieve in an introductory text like this is to 
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give the reader a glimpse of some of the fundamental concepts of time series analysis. For 
those who want to pursue this topic further, we provide references. 1 

21.1 A Look at Selected U.S. Economic Time Series 


To set the ball rolling, and to give the reader a feel for the somewhat esoteric concepts of 
time series analysis to be developed in this chapter, it might be useful to consider several 
U.S. economic time series of general interest. The time series we consider are: 

DPI = real disposable personal income (billions of dollars) 

GDP = gross domestic product (billions of dollars) 

PCE = real personal consumption expenditure (billions of dollars) 

CP = corporate profits (billions of dollars) 

Dividend = dividends, (billions of dollars) 

The time period covered is from 1947-1 to 2007-IV, for a total of244 quarters, and all data 
are seasonally adjusted at the annual rate. All the data are collected from FRED, the 
economic website of the Federal Reserve Bank of St. Louis. GDP, DPI, and PCE are in 
constant dollars, here 2000 dollars. CP and Dividend are in nominal dollars. 

To save space, the raw data are posted on the book’s website. But to get some idea of 
these data, we have plotted them in the following two figures. Figure 21.1 is a plot of the 
data of logarithms of GDP, DPI, and PCE and Figure 21.2 presents the logs of the other 
two time series (CP and Dividend). It is common practice to plot the log of a time series 
to get a glimpse of the growth rate of such a series. A visual plot of the data is usually the 
first step in the analysis of time series. In these figures the letter L denotes the natural 
logarithm. 

The first impression we get from these two figures is that all these time series seem to be 
“trending” upward, albeit with fluctuations. Suppose we want to speculate on the shape of 
these curves beyond the sample period, say for all the quarters of 2008. 2 We can do that if 
we know the statistical, or stochastic, mechanism, or the data generating process (DGP) 
that generated these curves. But what is that mechanism? To answer this and related ques¬ 
tions, we need to study some “new” vocabulary that has been developed by time series 
analysts, to which we now turn. 


'At the introductory level, these references may be helpful: Cary Koop, Analysis of Economic Data, 

John Wiley & Sons, New York, 2000; Jeff B. Cromwell, Walter C. Labys, and Michel Terraza, Univariate 
Tests for Time Series Models, Sage Publications, California, Ansbury Park, 1994; Jeff B. Cromwell, 
Michael H. Hannan, Walter C. Labys, and Michel Terraza, Multivariate Tests for Time Series Models, 

Sage Publications, California, Ansbury Park, 1994; and H. R. Seddighi, K. A. Lawler, and A. V. Katos, 
Econometrics: A Practical Approach, Routledge, New York, 2000. At the intermediate level, see Walter 
Enders, Applied Econometric Time Series, John Wiley & Sons, New York, 1995; Kerry Patterson, An Intro¬ 
duction to Applied Econometrics: A Time Series Approach, St. Martin's Press, New York, 2000; T. C. Mills, 
The Econometric Modelling of Financial Time Series, 2d ed., Cambridge University Press, New York, 

1999; Marno Verbeek, A Guide to Modern Econometrics, John Wiley & Sons, New York, 2000; and 
Wojciech W. Charemza and Derek F. Deadman, New Directions in Econometric Practice: General to 
Specific Modelling and Vector Autoregression, 2d ed., Edward Elgar Publisher, New York, 1997. At the 
advanced level, see J. D. Hamilton, Time Series Analysis, Princeton University Press, Princeton, NJ, 

1994, and C. S. Maddala and In-Moo Kim, Unit Roots, Cointegration, and Structural Change, 
Cambridge University Press, 1998. At the applied level, see B. Bhaskara Rao, ed.. Cointegration for the 
Applied Economist, St. Martin's Press, New York, 1994, and Chandan Mukherjee, Howard White, and 
Marc Wuyts, Econometrics and Data Analysis for Developing Countries, Routledge, New York, 1998. 

2 Of course, we have the actual data for this period now and could compare it with the data that is 
"predicted" on the basis of the earlier period. 
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FIGURE 21.1 

Logarithms of real 
GDP, DPI, and PCE, 
United States, 
1947-2007 (quarterly, 
$ billions). 


FIGURE 21.2 

Logarithms of 
corporate profits (CP) 
and dividends. United 
States, 1947-2007 
(quarterly, $ billions). 

Note: L denotes logarithm. 






21.2 Key Concepts * 1 2 3 4 5 6 7 8 9 


What is this vocabulary? It consists of concepts such as these: 

1. Stochastic processes 

2. Stationarity processes 

3. Purely random processes 

4. Nonstationary processes 

5. Integrated variables 

6. Random walk models 

7. Cointegration 

8. Deterministic and stochastic trends 

9. Unit root tests 

In what follows we will discuss each of these concepts. Our discussion will often be heuristic. 
Wherever possible and helpful, we will provide appropriate examples. 

3 The following discussion is based on Maddala et al., op. cit., Charemza et al., op. cit., and Carol 
Alexander, Market Models: A Guide to Financial Data Analysis, John Wiley St Sons, New York, 2001. 
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21.3 Stochastic Processes 


A random or stochastic process is a collection of random variables ordered in timer If we 
let Y denote a random variable, and if it is continuous, we denote it as Y(t), but if it is dis¬ 
crete, we denoted it as Y, . An example of the former is an electrocardiogram, and an exam¬ 
ple of the latter is GDP, DPI, etc. Since most economic data are collected at discrete points 
in time, for our purpose we will use the notation Y, rather than Y(t). If we let Y represent 
GDP, for our data we have Y \, Y 2 , Y 3 ,... , 7 2 42, >243, >244, where the subscript 1 denotes 
the first observation (i.e., GDP for the first quarter of 1947) and the subscript 244 denotes 
the last observation (i.e., GDP for the fourth quarter of 2007). Keep in mind that each of 
these Y’s is a random variable. 

In what sense can we regard GDP as a stochastic process? Consider for instance the real 
GDP of $3,759,997 billion for 1970-1. In theory, the GDP figure for the first quarter of 
1970 could have been any number, depending on the economic and political climate then 
prevailing. The figure of 3,759.997 is a particular realization of all such possibilities. 4 5 
Therefore, we can say that GDP is a stochastic process and the actual values we observed 
for the period 1947-1 to 2007-IV are particular realizations of that process (i.e., sample). 
The distinction between the stochastic process and its realization is akin to the distinction 
between population and sample in cross-sectional data. Just as we use sample data to draw 
inferences about a population, in time series we use the realization to draw inferences about 
the underlying stochastic process. 

Stationary Stochastic Processes 

A type of stochastic process that has received a great deal of attention and scrutiny by time 
series analysts is the so-called stationary stochastic process. Broadly speaking, a stochas¬ 
tic process is said to be stationary if its mean and variance are constant over time and the 
value of the covariance between the two time periods depends only on the distance or gap or 
lag between the two time periods and not the actual time at which the covariance is computed. 
In the time series literature, such a stochastic process is known as a weakly stationary, or 
covariance stationary, or second-order stationary, or wide sense, stochastic process. For 
the purpose of this chapter, and in most practical situations, this type of stationarity often 
suffices. 6 

To explain weak stationarity, let Y, be a stochastic time series with these properties: 

Mean: E(Y t ) = p (21.3.1) 

Variance: var (Y t ) = E(Y t - pf = a 2 (21.3.2) 

Covariance: y k = E[{Y, - p)(Y t+k - //)] (21.3.3) 

where y k , the covariance (or autocovariance) at lag k, is the covariance between the values 
of Y, and Y t+k , that is, between two Y values k periods apart. If k = 0, we obtain yo, which 

4 The term "stochastic" comes from the Creek word "stokhos," which means a target or bull's-eye. If 
you have ever thrown darts on a dart board with the aim of hitting the bull's-eye, how often did you 
hit the bull's-eye? Out of a hundred darts you may be lucky to hit the bull's-eye only a few times; at 
other times the darts will be spread randomly around the bull's-eye. 

5 You can think of the value of $3,759,997 billion as the mean value of all possible values of CDP for 
the first quarter of 1970. 

6 A time series is strictly stationary if a//the moments of its probability distribution and not just the first 
two (i.e., mean and variance) are invariant over time. If, however, the stationary process is normal, 
the weakly stationary stochastic process is also strictly stationary, for the normal stochastic process is 
fully specified by its two moments, the mean and the variance. 
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is simply the variance of Y (= cr 2 ); if k — 1, y\ is the covariance between two adjacent 
values of Y, the type of covariance we encountered in Chapter 12 (recall the Markov first- 
order autoregressive scheme). 

Suppose we shift the origin of Y from Y, to Y t+m (say, from the first quarter of 1947 to 
the first quarter of 1952 for our GDP data). Now if Y t is to be stationary, the mean, variance, 
and autocovariances of Y t+m must be the same as those of Y t . In short, if a time series is sta¬ 
tionary, its mean, variance, and autocovariance (at various lags) remain the same no mat¬ 
ter at what point we measure them; that is, they are time invariant. Such a time series will 
tend to return to its mean (called mean reversion) and fluctuations around this mean (mea¬ 
sured by its variance) will have a broadly constant amplitude. 7 To put it differently, a 
stationary process will not drift too far away from its mean value because of the finite vari¬ 
ance. As we shall see shortly, this is not the case with nonstationary stochastic processes. It 
should be noted that for a stationary process the speed of mean reversion depends on the 
autocovariances; it is quick if the autocovariances are small and slow when they are large, 
as we will show shortly. 

If a time series is not stationary in the sense just defined, it is called a nonstationary time 
series (keep in mind we are talking only about weak stationarity). In other words, a nonsta¬ 
tionary time series will have a time-varying mean or a time-varying variance or both. 

Why are stationary time series so important? Because if a time series is nonstationary, 
we can study its behavior only for the time period under consideration. Each set of time se¬ 
ries data will therefore be for a particular episode. As a consequence, it is not possible to 
generalize it to other time periods. Therefore, for the purpose of forecasting, such (nonsta¬ 
tionary) time series may be of little practical value. 

How do we know that a particular time series is stationary? In particular, are the time se¬ 
ries shown in Figures 21.1 and 21.2 stationary? We will take this important topic up in Sec¬ 
tions 21.8 and 21.9, where we will consider several tests of stationarity. But if we depend 
on common sense, it would seem that the time series depicted in Figures 21.1 and 21.2 are 
nonstationary, at least in the mean values. But more on this later. 

Before we move on, we mention a special type of stochastic process (or time series), 
namely, a purely random, or white noise, process. We call a stochastic process purely ran¬ 
dom if it has zero mean, constant variance a 2 , and is serially uncorrelated. 8 You may recall 
that the error term u t , entering the classical normal linear regression model that we dis¬ 
cussed in Part 1 of this book, was assumed to be a white noise process, which we denoted 
as u t ~ IIDN(0, cr 2 ); that is, u t is independently and identically distributed as a normal dis¬ 
tribution with zero mean and constant variance. Such a process is called a Gaussian white 
noise process. 

Nonstationary Stochastic Processes 

Although our interest is in stationary time series, one often encounters nonstationary time 
series, the classic example being the random walk model (RWM). 9 It is often said that asset 
prices, such as stock prices or exchange rates, follow a random walk; that is, they are non¬ 
stationary. We distinguish two types of random walks: (1) random walk without drift (i.e., no 
constant or intercept term) and (2) random walk with drift (i.e., a constant term is present). 

7 This point has been made by Keith Cuthbertson, Stephen G. Hall, and Mark P. Taylor, Applied Econo¬ 
metric Techniques, The University of Michigan Press, 1995, p. 130. 

8 lf it is also independent, such a process is called strictly white noise. 

9 The term random walk is often compared with a drunkard's walk. Leaving a bar, the drunkard moves 
a random distance u t at time t, and, continuing to walk indefinitely, will eventually drift farther and 
farther away from the bar. The same is said about stock prices. Today's stock price is equal to yester¬ 
day's stock price plus a random shock. 
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Random Walk without Drift 

Suppose Ut is a white noise error term with mean 0 and variance a 2 . Then the series Y t is said 
to be a random walk if 


+ (21.3.4) 

In the random walk model, as Eq. (21.3.4) shows, the value of Y at time t is equal to its 
value at time {t — 1) plus a random shock; thus it is an AR(1) model in the language of 
Chapters 12 and 17. We can think of Eq. (21.3.4) as a regression of Y at time t on its value 
lagged one period. Believers in the efficient capital market hypothesis argue that stock 
prices are essentially random and therefore there is no scope for profitable speculation in 
the stock market: If one could predict tomorrow’s price on the basis of today’s price, we 
would all be millionaires. 

Now from Eq. (21.3.4) we can write 

T, = T 0 + ui 

Y2 — Y\ + U2 = Yq + U\ + U2 
T3 = Y2 + M3 = Yq + U\ + U2 + M3 

In general, if the process started at some time 0 with a value of T 0 , we have 


Y t = Y 0 + J2 u t 

(21.3.5) 

Therefore, 


E(Y t ) = e(y 0 + J2 u ‘) = Y o (why?) 

(21.3.6) 

In like fashion, it can be shown that 


var(T,) = to 2 

(21.3.7) 


As the preceding expression shows, the mean of Y is equal to its initial, or starting, value, 
which is constant, but as t increases, its variance increases indefinitely, thus violating a con¬ 
dition of stationarity. In short, the RWM without drift is a nonstationary stochastic process. 
In practice To is often set at zero, in which case E(Y t ) = 0. 

An interesting feature of the RWM is the persistence of random shocks (i.e., random er¬ 
rors), which is clear from Eq. (21.3.5): Y t is the sum of initial To plus the sum of random 
shocks. As a result, the impact of a particular shock does not die away. For example, if 
«2 = 2 rather than U2 = 0, then all Y, ’s from Y2 onward will be 2 units higher and the ef¬ 
fect of this shock never dies out. That is why random walk is said to have an infinite mem¬ 
ory. As Kerry Patterson notes, random walk remembers the shock forever; 10 that is, it has 
infinite memory. The sum u, is also known as a stochastic trend, about which more will 
be said shortly. 

Interestingly, if you write Eq. (21.3.4) as 

( Y t - Y t -i) = AT, = u, (21.3.8) 

where A is the first difference operator that we discussed in Chapter 12, it is easy to show 
that, while T, is nonstationary, its first difference is stationary. In other words, the first dif¬ 
ferences of a random walk time series are stationary. But we will have more to say about 
this later. 


10 Kerry Patterson, op cit., Chapter 6. 
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FIGURE 21.3 

A random walk 
without drift. 


Random Walk with Drift 

Let us modify Eq. (21.3.4) as follows: 

Y,=S + Y t _i+u t (21.3.9) 

where 5 is known as the drift parameter. The name drift comes from the fact that if we 
write the preceding equation as 

Y t - 7,_i = AY t = 8 + u, (21.3.10) 

it shows that Y, drifts upward or downward, depending on 8 being positive or negative. Note 
that model (21.3.9) is also an AR(1) model. 

Following the procedure discussed for random walk without drift, it can be shown that 
for the random walk with drift model (21.3.9), 

E(Y t ) = Y 0 +1 ■ 8 (21.3.11) 

var (Y t ) = ta 1 (21.3.12) 

As you can see, for RWM with drift the mean as well as the variance increases over time, 
again violating the conditions of (weak) stationarity. In short, RWM, with or without drift, 
is a nonstationary stochastic process. 

To give a glimpse of the random walk with and without drift, we conducted two simula¬ 
tions as follows: 

Y t = Y 0 + u t (21.3.13) 

where u, are white noise error terms such that each u t ~ N( 0, 1); that is, each u t follows 
the standard normal distribution. From a random number generator, we obtained 500 val¬ 
ues of u and generated Y, as shown in Eq. (21.3.13). We assumed To = 0. Thus, 
Eq. (21.3.13) is an RWM without drift. 

Now consider 

Y, = 8 + Y 0 + u t (21.3.14) 

which is RWM with drift. We assumed u, and To as in Eq. (21.3.13) and assumed that 
8 = 2 . 

The graphs of models (21.3.13) and (21.3.14), respectively, are in Figures 21.3 and 21.4. 
The reader can compare these two diagrams in light of our discussion of the RWM with and 
without drift. 
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FIGURE 21.4 

A random walk with 
drift. 



Y t = 2 + Y t _ j + u t [F 0 = 0] 


The random walk model is an example of what is known in the literature as a unit root 
process. Since this term has gained tremendous currency in the time series literature, we 
next explain what a unit root process is. 

21.4 Unit Root Stochastic Process 


Let us write the RWM (21.3.4) as: 

Y t = pY,_ l +u t -\<p<\ (21.4.1) 

This model resembles the Markov first-order autoregressive model that we discussed in the 
chapter on autocorrelation. If p = 1, Eq. (21.4.1) becomes a RWM (without drift). If p is in 
fact 1, we face what is known as the unit root problem, that is, a situation of nonstationar- 
ity; we already know that in this case the variance of Y t is not stationary. The name unit root 
is due to the fact that p — 1. 11 Thus the terms nonstationarity, random walk, unit root, and 
stochastic trend can be treated synonymously. 

If, however, \p\ < 1, that is if the absolute value of p is less than one, then it can be 
shown that the time series Y, is stationary in the sense we have defined it. 12 

In practice, then, it is important to find out if a time series possesses a unit root. 13 In Sec¬ 
tion 21.9 we will discuss several tests of unit root, that is, several tests of stationarity. In that 
section we will also determine whether the time series depicted in Figures 21.1 and 21.2 are 
stationary. Perhaps the reader might suspect that they are not. But we shall see. 


"A technical point: If p = 1, we can write Eq. (21.4.1) as ft — ft_1 = u t . Now using the lag operator 
L so that LY t = ft_ 1( L 2 Y t = ft-2, and so on, we can write Eq. (21.4.1) as (1 - L)Y t = u t . The term unit 
root refers to the root of the polynomial in the lag operator. If you set (1 — L) = 0, we obtain, L = 1, 
hence the name unit root. 

12 lf in Eq. (21.4.1) it is assumed that the initial value of T( = Vo) is zero, |p| < 1, and u t is white noise 
and distributed normally with zero mean and unit variance, then it follows that £(ft) = 0 and 
var (ft) =1/(1 — p 2 ). Since both these are constants, by the definition of weak stationarity, Y t is sta¬ 
tionary. On the other hand, as we saw before, if p = 1, ft is a random walk or nonstationary. 

13 A time series may contain more than one unit root. But we will discuss this situation later in the 
chapter. 
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21.5 Trend Stationary (TS) and Difference Stationary (DS) 
Stochastic Processes 


The distinction between stationary and nonstationary stochastic processes (or time series) 
has a crucial bearing on whether the trend (the slow long-run evolution of the time series 
under consideration) observed in the constructed time series in Figures 21.3 and 21.4 or in 
the actual economic time series of Figures 21.1 and 21.2 is deterministic or stochastic. 
Broadly speaking, if the trend in a time series is a deterministic function of time, such as 
time, time-squared etc., we call it a deterministic trend, whereas if it is not predictable, we 
call it a stochastic trend. To make the definition more formal, consider the following model 
of the time series Y,. 


Y t =Pi+P2t + ky t -i+Ut (21.5.1) 

where u, is a white noise error term and where t is time measured chronologically. Now we 
have the following possibilities: 

Pure random walk: If in Eq. (21.5.1) Pi = 0, p 2 = 0, p 2 — 1, we get 

Y, = Y t _i + u, (21.5.2) 

which is nothing but a RWM without drift and is therefore nonstationary. But note that, 
if we write Eq. (21.5.2) as 


AY t = (Y t — Y t _{) — u, (21.3.8) 

it becomes stationary, as noted before. Hence, a RWM without drift is a difference 
stationary process (DSP). 

Random walk with drift: If in Eq. (21.5.1) P\ / 0, p 2 = 0, p 2 = 1, we get 
Y t = Pi + Y t _i+u t (21.5.3) 

which is a random walk with drift and is therefore nonstationary. If we write it as 
(Y, - Y t -{) = AY, = Pi + u, (21.5.3o) 

this means Y, will exhibit a positive (Pi > 0) or negative (Pi < 0) trend (see Fig¬ 
ure 21.4). Such a trend is called a stochastic trend. Equation (21.5.3a) is a DSP 
process because the nonstationarity in Y, can be eliminated by taking first differences 
of the time series. Remember that u, in Eq. (21.5.3a) is a white noise error term. 
Deterministic trend: If in Eq. (21.5.1), Pi / 0, p 2 / 0, /S 3 = 0, we obtain 

Y t =Pi + p 2 t + u, (21.5.4) 

which is called a trend stationary process (TSP). Although the mean of Y, is 
P\ + Pit, which is not constant, its variance (= a 1 ) is. Once the values of P\ and p 2 
are known, the mean can be forecast perfectly. Therefore, if we subtract the mean of Y, 
from Y t , the resulting series will be stationary, hence the name trend stationary. This 
procedure of removing the (deterministic) trend is called detrending. 

Random walk with drift and deterministic trend: If in Eq. (21.5.1), Pi / 0, 

Pi ^ 0, p 2 «1, we obtain: 


Y, = Pi + Pit + T,_i + u t 


( 21 . 5 . 5 ) 
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FIGURE 21.5 

Deterministic versus 
stochastic trend. 




in which case we have a random walk with drift and a deterministic trend, which can 
be seen if we write this equation as 

A Y t = Pi+P 2 t + u t (21.5.5 a) 

which means that Y, is nonstationary. 

Deterministic trend with stationary AR(1) component: If in Eq. (21.5.1) 

P\ ^ 0, p 2 ^ 0, ft < 1, then we get 

Y t = ft+ftt + ft7 f _! + u, (21.5.6) 

which is stationary around the deterministic trend. 

To see the difference between stochastic and deterministic trends, consider Fig¬ 
ure 21.5. 14 The series named stochastic in this figure is generated by an RWM with drift: 
Y, = 0.5 + 7 ( _i + u t , where 500 values of u t were generated from a standard normal dis¬ 
tribution and where the initial value of Y was set at 1. The series named deterministic is gen¬ 
erated as follows: Y, = 0.5 1 + u t , where u, were generated as above and where t is time 
measured chronologically. 

As you can see from Figure 21.5, in the case of the deterministic trend, the deviations 
from the trend line (which represents the nonstationary mean) are purely random and they 
die out quickly; they do not contribute to the long-run development of the time series, 
which is determined by the trend component 0.5t. In the case of the stochastic trend, on the 
other hand, the random component u, affects the long-run course of the series Y t . 


21.6 Integrated Stochastic Processes 


The random walk model is but a specific case of a more general class of stochastic 
processes known as integrated processes. Recall that the RWM without drift is nonsta¬ 
tionary, but its first difference, as shown in Eq. (21.3.8), is stationary. Therefore, we call the 
RWM without drift integrated of order 1, denoted as 7(1). Similarly, if a time series has to 
be differenced twice (i.e., take the first difference of the first differences) to make it station¬ 
ary, we call such a time series integrated of order 2. 15 In general, if a (nonstationary) time 

14 The following discussion is based on Wojciech W. Charemza et al., op. cit., pp. 89-91. 

15 For example if 7 is 1(2), then A A Y t = A (Y t - 7-i) = A Y t — A7-i = Y t - 27-1 + 7-2 will become 
stationary. But note that AA7= A 2 7/ 7 - Y f _ 2 . 
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series has to be differenced d times to make it stationary, that time series is said to be 
integrated of order d. A time series Y t integrated of order d is denoted as Y, ~ 1(d). If 
a time series Y, is stationary to begin with (i.e., it does not require any differencing), it is said 
to be integrated of order zero, denoted by Y t ~ 7(0). Thus, we will use the terms “stationary 
time series” and “time series integrated of order zero” to mean the same thing. 

Most economic time series are generally 7(1); that is, they generally become stationary 
only after taking their first differences. Are the time series shown in Figures 21.1 and 21.2 
7(1) or of higher order? We will examine them in Sections 21.8 and 21.9. 

Properties of Integrated Series 

The following properties of integrated time series may be noted: Let X t ,Y t , and Z, be three 
time series. 

1. If X t ~ 7(0) and Y, ~ 7(1), then Z t = (X, + Y t ) — 7(1); that is, a linear combination 
or sum of stationary and nonstationary time series is nonstationary. 

2. If X t ~ 1(d), then Z, = (a + bX t ) — 1(d), where a and b are constants. That is, a linear 
combination of an 1(d) series is also 1(d). Thus, if X t ~ 7(0), then Z, = 
(a + bX t ) ~ 7(0). 

3. If X t ~ I(d\) and Y t ~ I(d 2 ), then Z, = (aX t + bY t ) ~ I(d 2 ), where d\ < d 2 . 

4. lfX, ~ 1(d) and Y t ~ 1(d), then Z t = (aX t + bY t ) ~ I(d*)\ d* is generally equal to 
d, but in some cases d* < d (see the topic of cointegration in Section 21.11). 

As you can see from the preceding statements, one has to pay careful attention in combin¬ 
ing two or more time series that are integrated of different order. 

To see why this is important, consider the two-variable regression model discussed in Chap¬ 
ter 3, namely, Y t — f}\ + fi 2 X, + u t . Under the classical OLS assumptions, we know that 



where the small letters, as usual, indicate deviation from mean values. Suppose Y, is 7(0), but 
X, is 7(1); that is, the former is stationary and the latter is not. Since X t is nonstationary, its 
variance will increase indefinitely, thus dominating the numerator term in Eq. (21.6.1) with 
the result that fi 2 will converge to zero asymptotically (i.e., in large samples) and it will not 
even have an asymptotic distribution. 16 

21.7 The Phenomenon of Spurious Regression 

To see why stationary time series are so important, consider the following two random walk 
models: 


Y t = 7,-1 + u t (21.7.1) 

X t =X,_ l +v, (21.7.2) 

where we generated 500 observations of u, from u t ~ N( 0, 1) and 500 observations of v, 
from v ( ~ 77(0,1) and assumed that the initial values of both Y and X were zero. We also 
assumed that u, and v t are serially uncorrelated as well as mutually uncorrelated. As you 
know by now, both these time series are nonstationary; that is, they are 7(1) or exhibit sto¬ 
chastic trends. 


16 This point is due to Maddala et al. ( op. cit., p. 26. 
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Suppose we regress Y, on X t . Since Y, and X, are uncorrelated /(l) processes, the R 2 
from the regression of Y on X should tend to zero; that is, there should not be any relation¬ 
ship between the two variables. But wait till you see the regression results: 


Variable Coefficient Std. Error t Statistic 

C -13.2556 0.6203 -21.36856 

X 0.3376 0.0443 f.61223 

R 2 = 0.1044 d = 0.0121 


As you can see, the coefficient of X is highly statistically significant, and, although the 
R 2 value is low, it is statistically significantly different from zero. From these results, you 
may be tempted to conclude that there is a significant statistical relationship between Y and 
X, whereas a priori there should be none. This is in a nutshell the phenomenon of spuri¬ 
ous or nonsense regression, first discovered by Yule. 17 Yule showed that (spurious) corre¬ 
lation could persist in nonstationary time series even if the sample is very large. That there 
is something wrong in the preceding regression is suggested by the extremely low 
Durbin-Watson d value, which suggests very strong first-order autocorrelation. According 
to Granger and Newbold, an R 2 > d is a good rule of thumb to suspect that the estimated 
regression is spurious, as in the example above. It may be added that the R 2 and the t statistic 
from such a spurious regression are misleading, and the t statistics are not distributed as 
(Student’s) t distribution and, therefore, cannot he used for testing hypotheses about the 
parameters. 

That the regression results presented above are meaningless can be easily seen from 
regressing the first differences of Y, ( = A Y t ) on the first differences of X t (= AX,); 
remember that although Y t and X t are nonstationary, their first differences are stationary. In 
such a regression you will find that R 2 is practically zero, as it should be, and the 
Durbin-Watson d is about 2. In Exercise 21.24 you are asked to rim this regression and 
verify the statement just made. 

Although dramatic, this example is a strong reminder that one should be extremely wary 
of conducting regression analyses based on time series that exhibit stochastic trends. And 
one should therefore be extremely cautious in reading too much into the regression results 
based on 7(1) variables. For an example, see Exercise 21.26. To some extent, this is true of 
time series subject to deterministic trends, an example of which is given in Exercise 21.25. 

21.8 Tests of Stationarity 

By now the reader probably has a good idea about the nature of stationary stochastic pro¬ 
cesses and their importance. In practice we face two important questions: (1) How do we 
find out if a given time series is stationary? (2) If we find that a given time series is not 
stationary, is there a way that it can be made stationary? We take up the first question in this 
section and discuss the second question in Section 21.10. 

Before we proceed, keep in mind that we are primarily concerned with weak, or covari¬ 
ance, stationarity. 

Although there are several tests of stationarity, we discuss only those that are prominently 
discussed in the literature. In this section we discuss two tests: (1) graphical analysis and 

17 C. U. Yule, "Why Do We Sometimes Get Nonsense Correlations Between Time Series? A Study in 
Sampling and the Nature of Time Series," journal of the Royal Statistical Sodety, vol. 89,1926, 
pp. 1-64. For extensive Monte Carlo simulations on spurious regression see C. W. J. Granger and 
P. Newbold, "Spurious Regressions in Econometrics," journal of Econometrics, vol. 2,1974, pp. 111-120. 
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(2) the correlogram test. Because of the importance attached to it in the recent past, we 
discuss the unit root test in the next section. We illustrate these tests with appropriate 
examples. 

1. Graphical Analysis 

As noted earlier, before one pursues formal tests, it is always advisable to plot the time 
series under study, as we have done in Figures 21.1 and 21.2 for the U.S. economic time 
series data posted on the book’s website. Such plots give an initial clue about the likely 
nature of the time series. Take, for instance, the GDP time series shown in Figure 21.1. You 
will see that over the period of study the log of GDP has been increasing, that is, showing 
an upward trend, suggesting perhaps that the mean of the log of GDP has been changing. 
This perhaps suggests that the log of the GDP series is not stationary. This is also more or 
less true of the other U.S. economic time series shown in Figure 21.2. Such an intuitive feel 
is the starting point of more formal tests of stationarity. 

2. Autocorrelation Function (ACF) and Correlogram 

One simple test of stationarity is based on the so-called autocorrelation function (ACF). 
The ACF at lag k, denoted by Pk, is defined as 


covariance at lag k 
variance 


( 21 . 8 . 1 ) 


where covariance at lag k and variance are as defined before. Note that if k — 0, po = 1 
(why?) 

Since both covariance and variance are measured in the same emits of measurement, pk 
is a unitless, or pure, number. It lies between — 1 and +1, as any correlation coefficient does. 
If we plot pk against k, the graph we obtain is known as the population correlogram. 

Since in practice we only have a realization (i.e., sample) of a stochastic process, we can 
only compute the sample autocorrelation function (SAFC), pt- To compute this, we must 
first compute the sample covariance at lag k, yk, and the sample variance, po, which are 
defined as: 18 

. YtXt ~ Y){Y,+k ~ Y) 
n = --- 

. £( 7 < - v) 2 

Vo = - 

n 

where n is the sample size and Y is the sample mean. 

Therefore, the sample autocorrelation function at lag k is: 

Pk=^ (21.8.4) 

Yo 

which is simply the ratio of sample covariance (at lag k) to sample variance. A plot of pk 
against k is known as the sample correlogram. 

How does a sample correlogram enable us to find out if a particular time series is sta¬ 
tionary? For this purpose, let us first present the sample correlograms of a purely white noise 


( 21 . 8 . 2 ) 

(21.8.3) 


18 Strictly speaking, we should divide the sample covariance at lag k by {n — k) and the sample vari¬ 
ance by (n — 1) rather than by n (why?), where n is the sample size. 
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FIGURE 21.6 

Correlogram of white 

AC = autocorrelation, 
PAC = partial 
autocorrelation 
(see Chapter 22), 
Q-Stat = Q statistic, 
Prob = probability. 


Sample: 2 500 
Included observations: 499 


Autocorrelation 


Partial Correlation 


AC PAC 


1 -0.022 -0.022 

2 -0.019 -0.020 

3 -0.009 -0.010 

4 -0.031 -0.031 

5 -0.070 -0.072 

6 -0.008 -0.013 

7 0.048 0.045 

8 -0.069 -0.070 

9 0.022 0.017 

10 -0.004 -0.011 

11 0.024 0.025 

12 0.024 0.027 

13 0.026 0.021 

14 -0.047 -0.046 

15 -0.037 -0.030 

16 -0.026 -0.031 

17 -0.029 -0.024 

18 -0.043 -0.050 

19 0.038 0.028 

20 0.099 0.093 

21 0.001 0.007 

22 0.065 0.060 

23 0.053 0.055 

24 -0.017 -0.004 

25 -0.024 -0.005 

26 -0.008 -0.008 

27 -0.036 -0.027 

28 0.053 0.072 

29 -0.004 -0.011 

30 -0.026 -0.025 


Q-Stat Prob 


0.2335 0.629 

0.4247 0.809 

0.4640 0.927 

0.9372 0.919 

3.4186 0.636 

3.4493 0.751 

4.6411 0.704 

7.0385 0.532 

7.2956 0.606 

7.3059 0.696 

7.6102 0.748 

7.8993 0.793 

8.2502 0.827 

9.3726 0.806 

10.074 0.815 

10.429 0.843 

10.865 0.863 

11.807 0.857 

12.575 0.860 

17.739 0.605 

17.739 0.665 

19.923 0.588 

21.404 0.556 

21.553 0.606 

21.850 0.644 

21.885 0.695 

22.587 0.707 

24.068 0.678 

24.077 0.725 

24.445 0.752 


random process and of a random walk process. Return to the driftless RWM (21.3.13). There 
we generated a sample of 500 error terms, the u’s, from the standard normal distribution. 
The correlogram of these 500 purely random error terms is as shown in Figure 21.6; we have 
shown this correlogram up to 30 lags. We will comment shortly on how one chooses the lag 
length. 

For the time being, just look at the column labeled AC, which is the sample autocorre¬ 
lation function, and the first diagram on the left, labeled Autocorrelation. The solid vertical 
line in this diagram represents the zero axis; observations to the right of the line are posi¬ 
tive values and those to the left of the line are negative values. As is very clear from this 
diagram, for a purely white noise process the autocorrelations at various lags hover around 
zero. This is the picture of a correlogram of a stationary time series. Thus, if the correlo¬ 
gram of an actual (economic) time series resembles the correlogram of a white noise time 
series, we can say that time series is probably stationary. 
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FIGURE 21.7 

Correlogram of a 
random walk time 
series. See Figure 21.6 
for definitions. 


Sample: 2 500 
Included observations: 499 

Autocorrelation Partial Correlation AC PAC Q-Stat Prob 




2 

3 

4 

5 

6 


10 

11 

12 

13 

14 

15 

16 
17 


20 

21 

22 

23 

24 

25 

26 

27 

28 

29 

30 

31 

32 

33 


0.992 

0.984 

0.976 

0.969 

0.961 


0.932 

0.927 

0.921 

0.916 

0.912 

0.908 

0.905 

0.902 


0.878 

0.873 


0.846 

0.832 

0.825 

0.819 


0.992 493.86 0.000 

0.000 980.68 0.000 

0.030 1461.1 0.000 

0.005 1935.1 0.000 

-0.059 2402.0 0.000 

0.050 2862.7 0.000 

0.004 3317.3 0.000 

0.040 3766.4 0.000 

-0.009 4210.1 0.000 

0.055 4649.1 0.000 

0.018 5083.9 0.000 

0.039 5514.9 0.000 

0.002 5942.4 0.000 

0.056 6367.0 0.000 

0.061 6789.8 0.000 

0.000 7210.6 0.000 

0.006 7629.4 0.000 

0.030 8046.7 0.000 

0.053 8463.1 0.000 

0.013 8878.7 0.000 

-0.041 9292.6 0.000 

-0.040 9704.1 0.000 

-0.044 10113. 0.000 

-0.012 10518. 0.000 

-0.023 10920. 0.000 

-0.041 11317. 0.000 

-0.055 11709. 0.000 

-0.045 12095. 0.000 

-0.010 12476. 0.000 

0.008 12851. 0.000 

-0.006 13221. 0.000 

0.003 13586. 0.000 

-0.006 13946. 0.000 


Now look at the correlogram of a random walk series, as generated, say, by Eq. (21.3.13). 
The picture is as shown in Figure 21.7. The most striking feature of this correlogram is that 
the autocorrelation coefficients at various lags are very high even up to a lag of 33 quarters. 
As a matter of fact, if we consider lags of up to 60 quarters, the autocorrelation coefficients 
are quite high; the coefficient is about 0.7 at lag 60. Figure 21.7 is the typical correlogram of 
a nonstationary time series: The autocorrelation coefficient starts at a very high value and 
declines very slowly toward zero as the lag lengthens. 

Now let us take a concrete economic example. Let us examine the correlogram of the 
LGDP time series plotted using the U.S. economic times series data posted on the book’s 
website (see Section 21.1). The correlogram up to 3 6 lags is shown in Figure 21.8. The LGDP 
correlogram up to 36 lags also shows a pattern similar to the correlogram of the random walk 
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FIGURE 21.8 

Correlogram of 
U.S. LGDP, 1947-1 
to 2007-IV See 
Figure 21.6 for 
definitions. 


Sample: 1947-1 2007-IV 
Included observations: 244 

Autocorrelation Partial Correlation AC PAC Q-Stat Prob 



0.977 

0.954 

0.931 


0.843 

0.822 

0.801 

0.780 

0.759 

0.738 

0.718 

0.679 

0.660 

0.642 

0.624 

0.607 

0.590 

0.573 

0.557 

0.541 

0.526 

0.511 

0.496 

0.482 

0.467 

0.453 

0.438 

0.424 

0.411 

0.385 

0.373 

0.360 


0.977 

-0.009 

-0.010 


-0.003 

-0.001 


-0.006 

-0.010 

-0.004 

-0.007 

-0.013 

0.003 

-0.005 

-0.001 

-0.004 

-0.002 

0.002 

0.003 

-0.003 

-0.003 

-0.003 

-0.001 

0.007 

0.002 

-0.005 

-0.011 

-0.009 

-0.005 

-0.006 

-0.005 

0.004 

0.004 

-0.001 

-0.009 

-0.010 


235.73 0.000 
461.43 0.000 
677.31 0.000 

883.67 0.000 

1080.9 0.000 

1269.3 0.000 

1449.3 0.000 
1621.0 0.000 

1784.6 0.000 

1940.6 0.000 

2089.0 0.000 

2230.0 0.000 

2364.1 0.000 

2491.5 0.000 

2612.4 0.000 

2727.2 0.000 

2836.2 0.000 

2939.6 0.000 

3037.8 0.000 

3130.9 0.000 

3219.3 0.000 

3303.1 0.000 

3382.5 0.000 

3457.9 0.000 

3529.4 0.000 

3597.2 0.000 

3661.4 0.000 

3722.0 0.000 

3779.2 0.000 

3833.1 0.000 

3883.9 0.000 

3931.6 0.000 

3976.7 0.000 

4019.1 0.000 

4058.9 0.000 

4096.3 0.000 


model in Figure 21.7. The autocorrelation coefficient starts at a very high value at lag 1 
(0.977) and declines very slowly. Thus it seems that the LGDP time series is nonstationary. If 
you plot the correlograms of the other U.S. economic time series shown in Figures 21.1 and 
21.2, you will also see a similar pattern, leading to the conclusion that all these time series are 
nonstationary; they may be nonstationary in mean or variance or hoth. 

Two practical questions may be posed here. First, how do we choose the lag length to 
compute the ACF? Second, how do you decide whether a correlation coefficient at a certain 
lag is statistically significant? The answer follows. 
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The Choice of Lag Length 

This is basically an empirical question. A rule of thumb is to compute ACF up to one-third to 
one-quarter the length of the time series. Since for our economic data we have 244 quarterly 
observations, by this rule lags of 61 to 81 quarters will do. To save space, we have only shown 
36 lags in the ACF graph in Figure 21.8. The best practical advice is to start with sufficiently 
large lags and then reduce them by some statistical criterion, such as the Akaike or Schwarz 
information criterion that we discussed in Chapter 13. Alternatively, one can use the follow¬ 
ing statistical tests. 

Statistical Significance of Autocorrelation Coefficients 

Consider, for instance, the correlogram of the LGDP time series given in Figure 21.8. How 
do we decide whether the correlation coefficient of 0.780 at lag 10 (quarters) is statistically 
significant? The statistical significance of any fa can be judged by its standard error. 
Bartlett has shown that if a time series is purely random, that is, it exhibits white noise (see 
Figure 21.6), the sample autocorrelation coefficients fa are approximately 19 

fa ~ <V(0, \/n) (21.8.5) 

that is, in large samples the sample autocorrelation coefficients are normally distributed 
with zero mean and variance equal to one over the sample size. Since we have 244 obser¬ 
vations, the variance is 1 /244 ~ 0.0041 and the standard error is Vo.0041 ~ 0.0640. Then 
following the properties of the standard normal distribution, the 95 percent confidence 
interval for any (population) pk is: 

fa ± 1.96(0.0640) =fa± 0.1254 (21.8.6) 

In other words, 

Prob (fa - 0.1254 < p k < fa + 0.1254) = 0.95 (21.8.7) 

If the preceding interval includes the value of zero, we do not reject the hypothesis that the 
true Pi is zero, but if this interval does not include 0, we reject the hypothesis that the true 
Pk is zero. Applying this to the estimated value of pi 0 = 0.873, the reader can verify that 
the 95 percent confidence interval for true pi 0 is (0.873 ± 0.1254) or (0.7476, 0.9984). 20 
Obviously, this interval does not include the value of zero, suggesting that we are 95 per¬ 
cent confident that the true pio is significantly different from zero. 21 As you can check, even 
at lag 20 the estimated p 2 o is statistically significant at the 5 percent level. 

Instead of testing the statistical significance of any individual autocorrelation coefficient, we 
can test the joint hypothesis that all the p k up to certain lags are simultaneously equal to zero. 
This can be done by using the Q statistic developed by Box and Pierce, which is defined as 22 

= (21-8.8) 


19 M. S. Bartlett, "On the Theoretical Specification of Sampling Properties of Autocorrelated Time 
Series," lournal of the Royal Statistical Society, Series B, vol. 27, 1946, pp. 27-41. 

20 Our sample size of 244 observations is reasonably large to use the normal approximation. 

21 Alternatively, if you divide the estimated value of any p k by the standard error of (f\/n), for suffi¬ 
ciently large n, you will obtain the standard Z value, whose probability can be easily obtained from 
the standard normal table. Thus for the estimated pio = 0.780, the Z value is 0.780/0.1066 = 7.32 
(approx.). If the true pio were in fact zero, the probability of obtaining a Z value of as much as 7.32 
or greater is very small, thus rejecting the hypothesis that the true pio is zero. 

22 C. E. P. Box and D. A. Pierce, "Distribution of Residual Autocorrelations in Autoregressive Integrated 
Moving Average Time Series Models," journal of the American Statistical Association, vol. 65, 1970, 
pp. 1509-1526. 
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where n = sample size and m = lag length. The Q statistic is often used as a test of whether 
a time series is white noise. In large samples, it is approximately distributed as the chi-square 
distribution with m df. In an application, if the computed Q exceeds the critical Q value from 
the chi-square distribution at the chosen level of significance, one can reject the null 
hypothesis that all the (true) Pk are zero; at least some of them must be nonzero. 

A variant of the Box-Pierce Q statistic is the Ljung-Box (LB) statistic, which is 
defined as 23 

LB = «(«+2)|^-^~x 2 >n (21.8.9) 

Although in large samples both Q and LB statistics follow the chi-square distribution with 
m df, the LB statistic has been found to have better (more powerful, in the statistical sense) 
small-sample properties than the Q statistic. 24 

Returning to the LGDP example given in Figure 21.8, the value of the Q statistic up to 
lag 36 is about 4096. The probability of obtaining such a Q value under the null hypothesis 
that the sum of 36 squared estimated autocorrelation coefficients is zero is practically zero, 
as the last column of that figures shows. Therefore, the conclusion is that the LGDP time 
series is probably nonstationary, therefore reinforcing our hunch from Figure 21.1 that the 
LGDP series may be nonstationary. In Exercise 21.16 you are asked to confirm that 
the other four U.S. economic time series are also nonstationary. 

21.9 The Unit Root Test 


A test of stationarity (or nonstationarity) that has become widely popular over the past sev¬ 
eral years is the unit root test. We will first explain it, then illustrate it, and then consider 
some of its limitations. 

The starting point is the unit root (stochastic) process that we discussed in Section 21.4. 
We start with 

Y t = pY t _ l +u, -l<p<l (21.4.1) 

where u t is a white noise error term. 

We know that if p = 1, that is, in the case of the unit root, Eq. (21.4.1) becomes a ran¬ 
dom walk model without drift, which we know is a nonstationary stochastic process. There¬ 
fore, why not simply regress Y, on its (one-period) lagged value T)_i and find out if the 
estimated p is statistically equal to 1? If it is, then Y, is nonstationary. This is the general 
idea behind the unit root test of stationarity. 

However, we cannot estimate Eq. (21.4.1) by OLS and test the hypothesis that p = 1 by 
the usual t test because that test is severely biased in the case of a emit root. Therefore, we 
manipulate Eq. (21.4.1) as follows: Subtract F f _i from both sides of Eq. (21.4.1) to obtain: 
Y t -Y t _ x = pY t _ x -Y t _ x +u, 

= (p-l)Y t ,+u t (21-9.1) 

which can be alternatively written as: 

AY t =8Y t ^+u t (21.9.2) 

where 8 = (p — 1) and A, as usual, is the first difference operator. 

23 G. M. Ljung and G. E. P. Box, "On a Measure of Lack of Fit in Time Series Models," Biometrika, 
vol. 66, 1978, pp. 66-72. 

24 The Q and LB statistics may not be appropriate in every case. For a critique, see Maddala et al., 
op. cit., p. 19. 
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In practice, therefore, instead of estimating Eq. (21.4.1), we estimate Eq. (21.9.2) and test 
the (null) hypothesis that 5 = 0, the alternative hypothesis being that 5 < 0 (see footnote 25). 
If 5 = 0, then p = 1, that is we have a unit root, meaning the time series under consideration 
is nonstationary. 

Before we proceed to estimate Eq. (21.9.2), it may be noted that if 5 = 0, Eq. (21.9.2) 
will become 


A Y t = (Y t -Y,_ 1 ) = u t (21.9.3) 

Since u t is a white noise error term, it is stationary, which means that the first differences of 
a random walk time series are stationary, a point we have already made before. 

Now let us turn to the estimation of Eq. (21.9.2). This is simple enough; all we have to do is 
to take the first differences of Y, and regress them on 7 f _i and see if the estimated slope coef¬ 
ficient in this regression ( = 5) is zero or not. If it is zero, we conclude that Y t is nonstationary. 
But if it is negative, we conclude that Y, is stationary. 25 The only question is which test we use 
to find out if the estimated coefficient of 7 ,_\ in Eq. (21.9.2) is zero or not. You might be 
tempted to say, why not use the usual t test? Unfortunately, under the null hypothesis that 5 = 0 
(i.e., p = 1), the t value of the estimated coefficient of 7,_i does not follow the t distribution 
even in large samples; that is, it does not have an asymptotic normal distribution. 

What is the alternative? Dickey and Fuller have shown that under the null hypothesis 
that 5 = 0, the estimated t value of the coefficient of Y t -\ inEq. (21.9.2) follows the r (tau) 
statistic. 26 These authors have computed the critical values of the tau statistic on the basis 
of Monte Carlo simulations. A sample of these critical values is given in Appendix D, 
Table D.7. The table is limited, but MacKinnon has prepared more extensive tables, which 
are now incorporated in several econometric packages. 27 In the literature the tau statistic 
or test is known as the Dickey-Fuller (DF) test, in honor of its discoverers. Interestingly, 
if the hypothesis that 5 = 0 is rejected (i.e., the time series is stationary), we can use the 
usual (Student’s) t test. Keep in mind that the Dickey-Fuller test is one-sided because the 
alternative hypothesis is that 5 < 0 (or p < 1). 

The actual procedure of implementing the DF test involves several decisions. In dis¬ 
cussing the nature of the emit root process in Sections 21.4 and 21.5, we noted that a ran¬ 
dom walk process may have no drift, or it may have drift, or it may have both deterministic 
and stochastic trends. To allow for the various possibilities, the DF test is estimated in three 
different forms, that is, under three different null hypotheses. 


Y, is a random walk: 

A7 ( = 57;_i + u t 

(21.9.2) 

Y, is a random walk with drift: 

A Y t = fl\+ 8Y,_\ + u t 

(21.9.4) 

Y, is a random walk with drift 
around a deterministic trend: 

A Y t = Pi + p 2 t + 57;_! + u t 

(21.9.5) 


25 Since S = (p — 1), for stationarity p must be less than one. For this to happen S must be negative. 
26 D. A. Dickey and W. A. Fuller, "Distribution of the Estimators for Autoregressive Time Series with a 
Unit Root," Journal of the American Statistical Association, vol. 74, 1979, pp. 427-431. See also W. A. 
Fuller, Introduction to Statistical Time Series, John Wiley St Sons, New York, 1976. 

27 J. C. MacKinnon, "Critical Values of Cointegration Tests," in R. E. Engle and C. W. J. Granger, eds., 
Long-Run Economic Relationships: Readings in Cointegration, Chapter 13, Oxford University Press, 
New York, 1991. 
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where t is the time or trend variable. In each case the hypotheses are: 

Null hypothesis: H 0 : <5 = 0 (i.e., there is a unit root or the time series is nonstationary, 

or it has a stochastic trend). 

Alternative hypothesis: Hi: S < 0 (i.e., the time series is stationary, possibly around a 

deterministic trend). 28 

If the null hypothesis is rejected, it means either (1) Y, is stationary with zero mean, in the 
case of Eq. (21.9.2), or (2) Y t is stationary with nonzero mean, in the case of Eq. (21.9.4). 
In the case of Eq. (21.9.5), we can test for <5 <0 (i.e., no stochastic trend) and a ^ 0 (i.e., 
the existence of a deterministic trend) simultaneously, using the F test, but using the criti¬ 
cal values tabulated by Dickey and Fuller. It may be noted that a time series may contain 
both a stochastic and a deterministic trend. 

It is extremely important to note that the critical values of the tau test to test the hypoth¬ 
esis that <5 = 0 are different for each of the preceding three specifications of the DF test, 
which can be seen clearly from Appendix D, Table D.7. Moreover, if, say, specification 
(21.9.4) is correct, but we estimate Eq. (21.9.2), we will be committing a specification 
error, whose consequences we already know from Chapter 13. The same is true if we esti¬ 
mate Eq. (21.9.4) rather than the true Eq. (21.9.5). Of course, there is no way of knowing 
which specification is correct to begin with. Some trial and error is inevitable, data mining 
notwithstanding. 

The actual estimation procedure is as follows: Estimate Eq. (21.9.2), or Eq. (21.9.3), or 
Eq. (21.9.4) by OLS; divide the estimated coefficient of Y,_\ in each case by its standard 
error to compute the (r) tau statistic; and refer to the DF tables (or any statistical package). 
If the computed absolute value of the tau statistic (|r|) exceeds the absolute DF or 
MacKinnon critical tau values, we reject the hypothesis that 8 = 0, in which case the time 
series is stationary. On the other hand, if the computed | r | does not exceed the absolute crit¬ 
ical tau value, we do not reject the null hypothesis, in which case the time series is nonsta¬ 
tionary. Make sure that you use the appropriate critical r values. In most applications the 
tau value will be negative. Therefore, alternatively we can say that if the computed (nega¬ 
tive) tau value is smaller than (i.e., more negative than) the critical tau value, we reject the 
null hypothesis (i.e., the time series is stationary) otherwise, we do not reject it (i.e., the 
time series is nonstationary). 

Let us return to the U.S. GDP time series. For this series, the results of the three regres¬ 
sions (21.9.2), (21.9.4), and (21.9.5) are as follows: The dependent variable in each case is 
AY, — ALGDP r , where LGDP is the logarithm of real GDP. 


aTgDP, = 0.000968LGDP,_, 

t= (12.9270) R 2 = 0.0147 d = 1.3194 


(21.9.6) 


ALGDP, = 0.0221 - 0.00165LGDP,_, 

t= (2.4342) (-1.5294) R 2 = 0.0096 d= 1.3484 


(21.9.7) 


ALGDP, = 0.2092 + 0.0002? - 0.0269LGDP,_i 

t= (1.8991) (1.7040) (-1.8102) 


(21.9.8) 


if 2 = 0.0215 d — 1.3308 


28 We rule out the possibility that S > 0, because in that < 
series will be explosive. 


p > 1, in which 


the underlying time 
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Our primary interest in all these regressions is in the f(= r) value of the LGDP ( _i 
coefficient. If you look at Table D.7 in Appendix D, you will see that the 5 percent critical 
tau values for sample size 250 (the closest number to our sample of 244 observations) are 
-1.95 (no intercept, no trend), -2.88 (intercept but no trend), and -3.43 (intercept as well 
as trend). EViews and other statistical packages provide critical values for the sample size 
used in the analysis. 

Before we examine the results, we have to decide which of the three models may be 
appropriate. We should rule out model (21.9.6) because the coefficient of LGDP,_ |, which is 
equal to 8 is positive. But since S = (p — 1), a positive S would imply that p > 1. Although a 
theoretical possibility, we rule this out because in this case the LGDP time series would be 
explosive. 29 That leaves us with models (21.9.7) and (21.9.8). In both cases the estimated <5 
coefficient is negative, implying that the estimated p is less than 1. For these two models, the 
estimated p values are 0.9984 and 0.9731, respectively. The only question now is if these val¬ 
ues are statistically significantly below 1 for us to declare that the GDP time series is stationary. 

For model (21.9.7) the estimated r value is —1.5294, whereas the 5 percent critical r 
value, as noted above, is —2.88. Since, in absolute terms, the former is smaller than the lat¬ 
ter, our conclusion is that the LGDP time series is not stationary. 30 

The story is the same for model (21.9.8). The computed r value of — 1.8102, in absolute 
terms, is smaller than the 5 percent critical value of—3.43. 

Therefore, on the basis of graphical analysis, the correlogram, and the Dickey-Fuller 
test, the conclusion is that for the quarterly periods of 1947 to 2007, the U.S. LGDP time 
series was nonstationary; i.e., it contained a unit root, or it had a stochastic trend. 

The Augmented Dickey-Fuller (ADF) Test 

In conducting the DF test as inEqs. (21.9.2), (21.9.4), and (21.9.5), it was assumed that the 
error term u t was uncorrelated. But in case the u, are correlated, Dickey and Fuller have de¬ 
veloped another test, known as the augmented Dickey-Fuller (ADF) test. This test is 
conducted by “augmenting” the preceding three equations by adding the lagged values of 
the dependent variable AY,. To be specific, suppose we use Eq. (21.9.5). The ADF test here 
consists of estimating the following regression: 

AY, = £i + fot+8Y t -1 + + * (21.9.9) 

i=1 

where e, is a pure white noise error term and where A7 ( _i = (F)_i — Y,_ 2), A7 f _2 = 
(Tt-2 - 3), etc. The number of lagged difference terms to include is often determined em¬ 

pirically, the idea being to include enough terms so that the error term in Eq. (21.9.9) is serially 
uncorrelated, so that we can obtain an unbiased estimate of 8, the coefficient of lagged Y,_\. 
EViews 6 has an option that automatically selects the lag length based on Akaike, Schwarz, and 
other information criteria. In ADF we still test whether 5 = 0 and the ADF test follows the 
same asymptotic distribution as the DF statistic, so the same critical values can be used. 

To give a glimpse of this procedure, we estimated Eq. (21.9.9) for the LGDP series. 
Since we have quarterly data, we decided to use four lags. The results of the ADF regres¬ 
sion are as follows: 31 

29 More technically, since Eq. (21.9.2) is a first-order difference equation, the so-called stability condi¬ 
tion requires that \p\ < 1. 

30 Another way of stating this is that the computed r value should be more negative than the critical t 
value, which is not the case here. Hence the conclusion stays. Since in general S is expected to be 
negative, the estimated r statistic will have a negative sign. Therefore, a large negative r value is 
generally an indication of stationarity. 

31 Higher-order lagged differences were considered but they were insignificant. 
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ALGDP, = 0.2677 + 0.0003f - 0.0352LGDP W + 0.2990ALGDP,_! + 0.1451 ALGDP,_ 2 - 0.0621 ALGDP,_ 3 - 0.0876ALGDP, 
t= (2.4130) (2.2561) (-2.3443) (4.6255) (2.1575) (-0.9205) (-1.3438) 

J? 2 = 0.1617 d = 2.0075 

(21.9.10) 

The t{ = r) value of the lagged LGDP,_i coefficient (= 5) is —2.3443, which in absolute 
terms is much less than even the 10 percent critical rvalue of—3.1378, again suggesting that 
even after taking care of possible autocorrelation in the error term, the LGDP series is non¬ 
stationary. {Note: The @trend command in EViews automatically generates the time or trend 
variable.) 

Could this be the result of our choosing only four lagged values of ALGDP? We used 
the Schwarz criterion using 14 lagged values of ALGDP, which gave the tau value 5 of 
— 1.8102. Even then, this tau value was not significant at the 10 percent level (the critical 
tau value at this level was —3.1376). It seems logged GDP is nonstationary. 

Testing the Significance of More than One Coefficient: 

The F Test 

Suppose we estimate model (21.9.5) and test the hypothesis that — p 2 = 0, that is, the 
model is RWM without drift and trend. To test this joint hypothesis, we can use the re¬ 
stricted F test discussed in Chapter 8. That is, we estimate Eq. (21.9.5) (the unrestricted re¬ 
gression) and then estimate Eq. (21.9.5) again, dropping the intercept and trend. Then we 
use the restricted F test as shown in Eq. (8.6.9), except that we cannot use the conventional 
F table to get the critical F values. As they did with the r statistic, Dickey and Fuller have 
developed critical F values for this situation, a sample of which is given in Appendix D, 
Table D.7. An example is presented in Exercise 21.27. 

The Phillips-Perron (PP) Unit Root Tests 32 

An important assumption of the DF test is that the error terms u t are independently and 
identically distributed. The ADF test adjusts the DF test to take care of possible serial cor¬ 
relation in the error terms by adding the lagged difference terms of the regressand. Phillips 
and Perron use nonparametric statistical methods to take care of the serial correlation in 
the error terms without adding lagged difference terms. Since the asymptotic distribution 
of the PP test is the same as the ADF test statistic, we will not pursue this topic here. 

Testing for Structural Changes 

The macroeconomic data introduced in Section 21.1 (see the book’s website for the actual 
data) are for the period 1947-2007, a period of 61 years. In this period the U.S. economy ex¬ 
perienced several business cycles of varying durations. Business cycles are marked by periods 
of recessions and periods of expansions. It is quite likely that one business cycle is different 
from another, which may reflect structural breaks or structural changes in the economy. 

For instance, take the first oil embargo in 1973. It quadrupled oil prices. Prices again 
increased substantially after the second oil embargo in 1979. Naturally, these shocks will 
affect economic behavior. Therefore, if we were to regress personal consumption expendi¬ 
ture (PCE) on disposable personal income (DPI), the intercept, the slope, or both are likely 
to change from one business cycle to another (recall the Chow test of structural breaks). 
This is what is meant by structural changes. 


32 P. C. B. Phillips and P. Perron, "Testing for a Unit Root in Time Series Regression," Biometrika, 
vol. 75, 1988, pp. 335-346. The PP test is now included in several software packages. 
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Perron, for instance, has argued that the standard tests of the unit root hypothesis may 
not be reliable in the presence of structural changes. 33 There are ways to test for structural 
changes and to account for them, the simplest involving the use of dummy variables. But a 
discussion of the various tests of structural breaks will take us far afield and is best left for 
the references. 34 However, see Exercise 21.28. 

A Critique of the Unit Root Tests 35 

We have discussed several unit root tests and there are several more. The question is: Why are 
there so many unit root tests? The answer lies in the size and power of these tests. By size of a 
test we mean the level of significance (i.e., the probability of committing a Type I error) and by 
power of a test we mean the probability of rejecting the null hypothesis when it is false. The 
power of a test is calculated by subtracting the probability of a Type II error from 1; Type II error 
is the probability of accepting a false null hypothesis. The maximum power is 1. Most unit root 
tests are based on the null hypothesis that the time series under consideration has a unit root; 
that is, it is nonstationary. The alternative hypothesis is that the time series is stationary. 

Size of Test 

You will recall from Chapter 13 the distinction we made between the nominal and the true 
levels of significance. The DF test is sensitive to the way it is conducted. Remember that we 
discussed three varieties of the DF test: (1) a pure random walk, (2) a random walk with 
drift, and (3) a random walk with drift and trend. If, for example, the true model is (1) but 
we estimate (2), and conclude that, say, on the 5 percent level that the time series is sta¬ 
tionary, this conclusion may be wrong because the true level of significance in this case is 
much larger than 5 percent. 36 The size distortion could also result from excluding moving 
average (MA) components from the model (on moving average, see Chapter 22). 

Power of Test 

Most tests of the DF type have low power; that is, they tend to accept the null of unit root 
more frequently than is warranted. That is, these tests may find a unit root even when none 
exists. There are several reasons for this. First, the power depends on the (time) span of the 
data more than the mere size of the sample. For a given sample size n, the power is greater 
when the span is large. Thus, a unit root test(s) based on 30 observations over a span of 
30 years may have more power than one based on, say, 100 observations over a span 
of 100 days. Second, if p ~ 1 but not exactly 1, the emit root test may declare such a time 
series nonstationary. Third, these types of tests assume a single unit root; that is, they assume 
that the given time series is 7(1). But if a time series is integrated of order higher than 1, say, 
1(2), there will be more than one emit root. In the latter case one may use the Dickey-Pantula 
test. 37 Fourth, if there are structural breaks in a time series (see the chapter on dummy vari¬ 
ables) due to, say, the OPEC oil embargoes, the unit root tests may not catch them. 

In applying the unit root tests one should therefore keep in mind the limitations of the 
tests. Of course, there have been modifications of these tests by Perron and Ng, Elliot, 

33 P. Perron, "The Great Crash, the Oil Price Shock and the Unit Root Hypothesis," Econometrica, 
vol. 57, 1989, pp. 1361-1401. 

34 For an accessible discussion, see James H. Stock and Mark W. Watson, Introduction to Econometrics, 

2d ed., Pearson/Addison-Wesley, Boston, 2007, pp. 565-571. For a more thorough discussion, see 
G. S. Maddala and In-Moo Kim, Unit Roots, Cointegration, and Structural Change, Cambridge 
University Press, New York, 1998. 

35 For detailed discussion, see Terrence C. Mills, op. cit., pp. 87-88. 

36 For a Monte Carlo experiment about this, see Charemza et al., op. cit., p. 114. 

37 D. A. Dickey and S. Pantula, "Determining the Order of Differencing in Autoregressive Processes," 
journal of Business and Economic Statistics, vol. 5, 1987, pp. 455-461. 
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Rothenberg and Stock, Fuller, and Leybounre. 38 Because of this, Maddala and Kim advo¬ 
cate that the traditional DF, ADF, and PP tests should be discarded. As econometric soft¬ 
ware packages incorporate the new tests, that may very well happen. But it should be added 
that as yet there is no uniformly powerful test of the unit root hypothesis. 


21.10 Transforming Nonstationary Time Series 


Now that we know the problems associated with nonstationary time series, the practical 
question is what to do. To avoid the spurious regression problem that may arise from re¬ 
gressing a nonstationary time series on one or more nonstationary time series, we have to 
transform nonstationary time series to make them stationary. The transformation method 
depends on whether the time series are difference stationary (DSP) or trend stationary 
(TSP). We consider each of these methods in turn. 

Difference-Stationary Processes 

If a time series has a unit root, the first differences of such time series are stationary. 39 
Therefore, the solution here is to take the first differences of the time series. 

Returning to our U.S. LGDP time series, we have already seen that it has a unit root. Let 
us now see what happens if we take the first differences of the LGDP series. 

Let ALGDP, = (LGDP, — LGDP,_i). For convenience, let D, = ALGDP,. Now con¬ 
sider the following regression: 

AD, = 0.00557 - 0.671 ID,_i 

t = (7.1407) (-11.0204) (21.10.1) 

R 2 = 0.3360 d = 2.0542 

The 1 percent critical DF r value is —3.4574. Since the computed r ( = t) of — 11.0204 
is more negative than the critical value, we conclude that the first-differenced LGDP is 
stationary; that is, it is 7(0). It is as shown in Figure 21.9. If you compare Figure 21.9 with 
Figure 21.1, you will see the obvious difference between the two. 


FIGURE 21.9 

First differences of 
logs of U.S. GDP, 
1947-2007 (quarterly). 


Time series plot of DLGDP 



If it is 1(d), it has to be differenced d times, where d is any integer. 
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Trend-Stationary Processes 

As we have seen in Figure 21.5, a TSP is stationary around the trend line. Hence, the 
simplest way to make such a time series stationary is to regress it on time and the resid¬ 
uals from this regression will then be stationary. In other words, run the following 
regression: 


Y t =fa+fat + u t (21.10.2) 

where Y, is the time series under study and where t is the trend variable measured 
chronologically. 

Now 

= (21.10.3) 

will be stationary, u, is known as a (linearly) detrended time series. 

It is important to note that the trend may be nonlinear. For example, it could be 

Y t = fa + fat + fat 2 + u t (21.10.4) 

which is a quadratic trend series. If that is the case, the residuals from Eq. (21.10.4) will 
now be (quadratically) detrended time series. 

It should be pointed out that if a time series is DSP but we treat it as TSP, this is called 
underdifferencing. On the other hand, if a time series is TSP but we treat it as DSP, this is 
called overdifferencing. The consequences of these types of specification errors can be se¬ 
rious, depending on how one handles the serial correlation properties of the resulting error 
terms. 40 

To see what happens if we confuse a TSP series with a DSP series or vice versa, Fig¬ 
ure 21.10 shows the first-differenced LGDP and the residuals of LGDP estimated from the 
TSP regression (21.10.2): 


FIGURE 21.10 

First differences (delta 
LGDP) and deviations 
from trend (RESI1) 
for logged GDP, 
1947-2007 (quarterly). 





40 For a detailed discussion of this, see Maddala et al., op. cit., Section 2.7. 
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A look at this figure tells us that the first differences of real logged DGP are stationary (as 
confirmed hy regression (21.10.1) hut the residuals from the trend line (RESI1) are not. 

In summary, “. . . it is very important to apply the right sort of stationarity transform to 
the data, if they are not already stationary. Most financial markets generate price, rate or 
yield data that are non-stationary because of stochastic rather than a deterministic trend. It 
is hardly ever appropriate to detrend the data by fitting a trend line and taking deviations. 
Instead the data should be detrended by taking first differences, usually of the log price or 
rates, because then the transformed stationary data will correspond to market returns.” 41 

21.11 Cointegration: Regression of a Unit Root Time Series 
on Another Unit Root Time Series 

We have warned that the regression of a nonstationary time series on another nonstationary 
time series may produce a spurious regression. Let us suppose that we consider the LPCE and 
LDPI time series data introduced in Section 21.1 (see the book’s website for the actual data). 
Subjecting these time series individually to emit root analysis, you will find that they both are 
7(1); that is, they contain a stochastic trend. It is quite possible that the two series share the 
same common trend so that the regression of one on the other will not be necessarily spurious. 

To be specific, we use the U.S. economic time series data (see Section 21.1 and the 
hook’s website) and run the following regression of LPCE on LDPI: 

LPCE, = ^ + ftLDPI, + Ut (21.11.1) 

where L denotes logarithm, is the elasticity of real personal consumption expenditure 
with respect to real disposable personal income. For illustrative purposes, we will call it 
consumption elasticity. Let us write this as: 

u t = LPCE, - Pi- &LDPI, (21.11.2) 

Suppose we now subject u t to unit root analysis and find that it is stationary; that is, it is 7(0). 
This is an interesting situation, for although LPCE, and LDPI, are individually 7( 1), that is, they 
have stochastic trends, their linear combination (21.11.2) is 7(0). So to speak, the linear com¬ 
bination cancels out the stochastic trends in the two series. If you take consumption and income 
as two 7(1) variables, savings defined as (income — consumption) could be 7(0). As a result, a 
regression of consumption on income as in Eq. (21.11.1) would be meaningful (i.e., not spuri¬ 
ous). In this case we say that the two variables are cointegrated. Economically speaking, two 
variables will be cointegrated if they have a long-term, or equilibrium, relationship between 
them. Economic theory is often expressed in equilibrium terms, such as Fisher’s quantity the¬ 
ory of money or the theory of purchasing power parity (PPP), just to name a few. 

In short, provided we check that the residuals from regressions like (21.11.1) are 7(0) or 
stationary, the traditional regression methodology (including the t and F tests) that we have 
considered extensively is applicable to data involving (nonstationary) time series. The valu¬ 
able contribution of the concepts of unit root, cointegration, etc. is to force us to find out if 
the regression residuals are stationary. As Granger notes, “A test for cointegration can be 
thought of as a pre-test to avoid ‘spurious regression’ situations.” 42 

In the language of cointegration theory, a regression such as Eq. (21.11.1) is known as 
a cointegrating regression and the slope parameter is known as the cointegrating 
parameter. The concept of cointegration can be extended to a regression model containing 
k regressors. In this case we will have k cointegrating parameters. 

41 Carol Alexander, op. cit., p. 324. 

42 C. W. J. Granger, "Developments in the Study of Co-Integrated Economic Variables," Oxford Bulletin 
of Economics and Statistics, vol. 48, 1986, p. 226. 
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Testing for Cointegration 

A number of methods for testing cointegration have been proposed in the literature. We 
consider here a comparatively simple method, namely the DF or ADF unit root test on the 
residuals estimated from the cointegrating regression. 43 

Engle-Granger (EG) or Augmented Engle-Granger (AEG) Test 
We already know how to apply the DF or ADF unit root tests. All we have to do is estimate 
a regression like Eq. (21.11.1), obtain the residuals, and use the DF or ADF tests. 44 There is 
one precaution to exercise, however. Since the estimated u, are based on the estimated coin¬ 
tegrating parameter /+, the DF and ADF critical significance values are not quite appropri¬ 
ate. Engle and Granger have calculated these values, which can he found in the references. 45 
Therefore, the DF and ADF tests in the present context are known as Engle-Granger (EG) 
and augmented Engle-Granger (AEG) tests. Flowever, several software packages now 
present these critical values along with other outputs. 

Let us illustrate these tests. Using the data introduced in Section 21.1 and found on the 
book’s website, we first regressed LPCEC on LDPIC and obtained the following regression: 

LPCE, = -0.1942 + 1.0114LDPI, 

t= (-8.2328) (348.5429) (21.11.3) 

R 2 = 0.9980 d — 0.1558 

Since LPCE and LDPI are individually nonstationary, there is the possibility that this 
regression is spurious. But when we performed a unit root test on the residuals obtained 
from Eq. (21.11.3), we obtained the following results: 

A u t = —0.0764w,_i 

t = (-3.0458) (21.11.4) 

R 2 = 0.0369 d= 2.5389 

The Engle-Granger asymptotic 5 percent and 10 percent critical values are about -3.34 
and —3.04, respectively. Therefore, the residuals from the regression are not stationary at 
the 5 percent level. It would he difficult to accept this reason, for economic theory suggests 
that there should he a stable relationship between PCE and DPI. 

Let us reestimate Eq. (21.11.3) including the trend variable and then see if the residuals 
from this equation are stationary. We present the results first and then discuss what may be 
going on. 

LPCE, = 2.8130 + 0.0037, + 0.5844LDPI, 

f = (21.3491) (22.9394) (31.2754) (21.11.3a) 

R 2 = 0.9994 d = 0.2956 


43 There is this difference between tests for unit roots and tests for cointegration. As David A. Dickey, 
Dennis W. Jansen, and Daniel I. Thornton observe, "Tests for unit roots are performed on univariate [i.e., 
single] time series. In contrast, cointegration deals with the relationship among a group of variables, 
where (unconditionally) each has a unit root." See their article, "A Primer on Cointegration with an 
Application to Money and Income," Economic Review, Federal Reserve Bank of St. Louis, March-April 
1991, p. 59. As the name suggests, this article is an excellent introduction to cointegration testing. 

^If PCE and DPI are not cointegrated, any linear combination of them will be nonstationary and, 
therefore, the u t will also be nonstationary. 

45 R. F. Engle and C. W. Granger, "Co-integration and Error Correction: Representation, Estimation and 
Testing," Econometrica, vol. 55, 1987, pp. 251-276. 
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To see if the residuals from this regression are stationary, we obtained the following results 
(compare with Eq. [21.11.4]): 

A ~u,= —0.1498m,_ i 

t = (-4.4545) (21.11.4a) 

R 2 = 0.0758 d = 2.3931 
Note: u t is the residual from Eq. (21.11.3a). 

The DF test now shows that these residuals are stationary. Even if we use ADF with sev¬ 
eral lags, the residuals are still stationary. 

What is going on here? Although the residuals from regression (21.11,4a) are stationary, 
that is, they are 7(0), they are stationary around a deterministic time trend, the trend here 
being linear. That is, the residuals are 7(0) plus a linear trend. As noted earlier, a time series 
may contain both a deterministic and a stochastic trend. 

Before we proceed further, it should be noted that our time series data cover a long 
period of time (61 years). It is quite possible that because of structural changes in the U.S. 
economy over this period, our results and conclusions are likely to differ. In Exercise 21.28 
you are asked to check for this possibility. 

Cointegration and Error Correction Mechanism (ECM) 

We just showed that, allowing for the (linear) trend, LPCE and LDPI seem to be cointegrated, 
that is, there is a long-term, or equilibrium, relationship between the two. Of course, in the 
short-run there may be disequilibrium. Therefore, we can treat the error term in the following 
equation as the “equilibrium error.” And we can use this error term to tie the short-run 
behavior of PCE to its long-run value: 

u, = LPCE, - p x - &LDPI — p 3 t (21.11.5) 

The error correction mechanism (ECM) first used by Sargan 46 and later popularized 
by Engle and Granger corrects for disequilibrium. An important theorem, known as the 
Granger representation theorem, states that if two variables Y and X are cointegrated, the 
relationship between the two can be expressed as ECM. To see what this means, let us re¬ 
vert to our PCE-DPI example. Now consider the following model: 

ALPCE, = «o + «i ALDPI, + o!2U,_i + £, (21.11.6 ) 

where e, is a white noise error term and m,_ i is the lagged value of the error term in 
Eq. (21.11.5). 

ECM equation (21.11.5) states that ALPCE depends on ALDPI and also on the equilib¬ 
rium error term. 47 If the latter is nonzero, then the model is out of equilibrium. Suppose 
ALDPI is zero and u t -\ is positive. This means LPCE,_i is too high to be in equilibrium, that 
is, LPCE t _i is above its equilibrium value of (a 0 + oqLDPI,^). Since a 2 is expected to be 
negative, the term a 2 u t _\ is negative and, therefore, ALPCE, will he negative to restore the 
equilibrium. That is, if LPCE, is above its equilibrium value, it will start falling in the next 
period to correct the equilibrium error; hence the name ECM. By the same token, if m,_i is 
negative (i.e., LPCE is below its equilibrium value), ot 2 u t _\ will be positive, which will 
cause ALPCE, to he positive, leading LPCE, to rise in period t. Thus, the absolute value of 
ct 2 decides how quickly the equilibrium is restored. In practice, we estimate m,_i by 

46 J. D. Sargan, "Wages and Prices in the United Kingdom: A Study in Econometric Methodology," in 
K. F. Wallis and D. F. Hendry, eds., Quantitative Economics and Econometric Analysis, Basil Blackwell, 
Oxford, U.K., 1984. 

47 The following discussion is based on Cary Koop, op. cit., pp. 159-160 and Kerry Peterson, op. cit.. 
Section 8.5. 
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M/_i = (LPCE, — Pi — /j 2 LDPI — Pit). Keep in mind that the error correction coefficient 
ct 2 is expected to be negative (why?). 

Returning to our illustrative example, the empirical counterpart of Eq. (21.11.6) is: 
A EPCTEf = 0.0061 + 0.2967ALDPI f - 0.1223w f _i 

t= (9.6753) (6.2282) (-3.8461) (21.11.7) 

R 2 = 0.1658 d = 2.1496 

Statistically, the ECM term is significant, suggesting that PCE adjusts to DPI with a lag; 
only about 12 percent of the discrepancy between long-term and short-term PCE is cor¬ 
rected within a quarter. 

From regression (21.11.7) we see that the short-run consumption elasticity is about 
0.29. The long-run elasticity is about 0.58, which can be seen from Eq. (21.11 3d). 

Before we conclude this section, the caution sounded by S. G. Hall is worth remembering: 

While the concept of cointegration is clearly an important theoretical underpinning of the error 
correction model there are still a number of problems surrounding its practical application; the 
critical values and small sample performance of many of these tests are unknown for a wide 
range of models; informed inspection of the correlogram may still be an important tool. 48 


21.12 Some Economic Applications 


We conclude this chapter by considering some concrete examples. 


EXAMPLE 21.1 

Ml Monthly 
Money Supply in 
the United States, 
January 1959 to 
March 1, 2008 


Figure 21.11 shows the Ml money supply for the United States from January 1959 to 
March 1, 2008. From our knowledge of stationary, it seems that the Ml money supply 
time series is nonstationary, which can be confirmed by unit root analysis. (Note: To save 

FIGURE 21.11 U.S. money supply over 1959:01 to 2008:03. 



48 S. C. Hall, "An Application of the Granger and Engle Two-Step Estimation Procedure to the United 
Kingdom Aggregate Wage Data," Oxford Bulletin of Economics and Statistics, vol. 48, no. 3, August 
1986, p. 238. See also John Y. Campbell and Pierre Perron, "Pitfalls and Opportunities: What 
Macroeconomists Should Know about Unit Roots," NBER (National Bureau of Economic Research) 
Macroeconomics Annual 1991, pp. 141-219. 
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EXAMPLE 21.1 space, we have not given the actual data, which can be obtained from the Federal Reserve 
(, Continued) Board or the Federal Reserve Bank of St. Louis.) 

A M t = -0.1 347 + 0.02931 - 0.0102 

t= (—0.14) (2.62) (-2.30) (21.12.1) 

R 2 = 0.0130 d = 2.2325 

The 1, 5, and 10 percent critical rvalues are -3.9811, -3.4210, and -3.1 329. Since the 
t value of -2.30 is less negative than any of these critical values, the conclusion is that the 
Ml time series is nonstationary; that is, it contains a unit root or it is /(I). Even when sev¬ 
eral lagged values of AM t (a la ADF) were introduced, the conclusion did not change. On 
the other hand, the first differences of the Ml money supply were found to be stationary 
(check this out). 


EXAMPLE 21.2 

The U.S./U.K. 
Exchange Rate: 
January 1971 to 
April 2008 


Figure 21.12 gives the graph of the ($/£) exchange rate from January 1971 to April 2008, 
for a total of 286 observations. By now you should be able to spot this time series as non¬ 
stationary. Carrying out the unit root tests, we obtained the following r statistics: -0.82 
(no intercept, no trend), -1.96 (intercept), and -1.33 (intercept and trend). Each of these 
statistics, in absolute value, was less than its critical rvalue from the appropriate DF tables, 
thus confirming the graphical impression that the U.S./U.K. exchange rate time series is 
nonstationary. 



EXAMPLE 21.3 

U.S. Consumer 
Price Index 
(CPI), January 
1947 to March 
2008 


Figure 21.13 shows the U.S. CPI from January 1947 to March 2008 for a total of 733 
observations. The CPI series, like the Ml series considered previously, shows a sustained 
upward trend. The unit root exercise gave the following results: 

ACPIt = -0.01082 + 0.00068t - 0.00096CPIH + 0.40669ACPI M 

t= (—0.54) (4.27) (-1.77) (12.03) (21.12.2) 

R 2 = 0.3570 d= 1.9295 
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Observation number 


The t ( m t) value of CPI t -i is -1.77. The 10 percent critical value is -B.1317. Since, in 
absolute terms, the computed r is less than the critical r, the conclusion is that CPI is not 
a stationary time series. We can characterize it as having a stochastic trend (why?). 
However, if you take the first differences of the CPI series, you will find them to be 
stationary. Hence CPI is a difference-stationary (DS) time series. 


EXAMPLE 21.4 Figure 21.14 plots (constant maturity) 3-month and 6-month U.S. Treasury bill (T-bill) 

Are 3-Month and rates fr° m January 1982 to March 2008, for a total of 315 observations. Does the graph 

6 Month Treasu show that the two rates are cointegrated; that is, is there an equilibrium relationship 
y between the two? From financial theory, we would expect that to be the case, otherwise 

Bill Rates arbitrageurs will exploit any discrepancy between the short and the long rates. First of all, 

Cointegrated? let us see if the two time series are stationary. 


FIGURE 21.14 

Three- and 
six-month Treasury 
bill rates (constant 
maturity). 










768 Part Four Simultaneous-Equation Models and Time Series Econometrics 


EXAMPLE 21.4 On the basis of the pure random walk model (i.e., no intercept, no trend), both the 
(, Continued) rates were stationar y- Including intercept, trend, and one lagged difference, the results 

suggested that the two rates might be trend stationary; the trend coefficient in both 
cases was negative and significant at about the 7 percent level. So, depending on which 
results we accept, the two rates are either stationary or trend stationary. 

Regressing the 6-month T-bill rate (TB6) on the 3-month T-bill rate, we obtained the 
following regression. 

TB6 f = 0.0842 + 1,0078TB3 t 

t = (3.65) (252.39) (21.12.3) 

R 1 2 3 4 5 6 = 0.995 d= 0.4035 

Applying the unit root test to the residuals from the preceding regression, we found that 
the residuals were stationary, suggesting that the 3- and 6-month T-bill rates were cointe¬ 
grated. Using this knowledge, we obtained the following error correction model (ECM): 

Af§6, = -0.0047 + 0.8992ATB3 t — 0.18550^! 

t = (—0.82) (47.77) (-5.69) (21.12.4) 

R 2 = 0.880 d= 1.5376 

where u f _i is the lagged value of the error correction term from the preceding period. 
As these results show, 0.19 of the discrepancy in the two rates in the previous month is 
eliminated this month. 49 Besides, short-run changes in the 3-month T-bill rate are quickly 
reflected in the 6-month T-bill rate, as the slope coefficient between the two is 0.8992. 
This should not be a surprising finding in view of the efficiency of the U.S. money markets. 


Summary and 
Conclusions 


1. Regression analysis based on time series data implicitly assumes that the underlying 
time series are stationary. The classical t tests, F tests, etc., are based on this assumption. 

2. In practice most economic time series are nonstationary. 

3. A stochastic process is said to be weakly stationary if its mean, variance, and auto¬ 
covariances are constant over time (i.e., they are time-invariant). 

4. At the informal level, weak stationarity can be tested by the correlogram of a time 
series, which is a graph of autocorrelation at various lags. For stationary time series, the 
correlogram tapers off quickly, whereas for nonstationary time series it dies off gradu¬ 
ally. For a purely random series, the autocorrelations at all lags 1 and greater are zero. 

5. At the formal level, stationarity can be checked by finding out if the time series contains 
a unit root. The Dickey-Fuller (DF) and augmented Dickey-Fuller (ADF) tests can be 
used for this purpose. 

6. An economic time series can be trend stationary (TS) or difference stationary (DS). 
A TS time series has a deterministic trend, whereas a DS time series has a variable, or 
stochastic, trend. The common practice of including the time or trend variable in a 
regression model to detrend the data is justifiable only for TS time series. The DF and 
ADF tests can be applied to determine whether a time series is TS or DS. 


49 Since both T-bill rates are in percent form, this would suggest that if the 6-month TB rate was 
higher than the 3-month TB rate more than expected a priori in the last month, this month it will be 
reduced by 0.19 percentage points to restore the long-run relationship between the two interest 
rates. For the underlying theory about the relationship between short- and long-run interest rates, 
see any money and banking textbook and read up on the term structure of interest rates. 
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7. Regression of one time series variable on one or more time series variables often can 
give nonsensical or spurious results. This phenomenon is known as spurious regression. 
One way to guard against it is to find out if the time series are cointegrated. 

8. Cointegration means that despite being individually nonstationary, a linear combina¬ 
tion of two or more time series can be stationary. The Engle-Granger (EG) and the 
augmented Engle-Granger (AEG) tests can be used to find out if two or more time 
series are cointegrated. 

9. Cointegration of two (or more) time series suggests that there is a long-run, or 
equilibrium, relationship between them. 

10. The error correction mechanism (ECM) developed by Engle and Granger is a means 
of reconciling the short-run behavior of an economic variable with its long-run behavior. 

11. The field of time series econometrics is evolving. The established results and tests are in 
some cases tentative and a lot more work remains. An important question that needs an 
answer is why some economic time series are stationary and others are nonstationary. 


Questions 

21.1. What is meant hy weak stationarity? 

21.2. What is meant by an integrated time series? 

21.3. What is the meaning of a unit root? 

21.4. If a time series is 7(3), how many times would you have to difference it to make it 
stationary? 

21.5. What are Dickey-Fuller (DF) and augmented DF tests? 

21.6. What are Engle-Granger (EG) and augmented EG tests? 

21.7. What is the meaning of cointegration? 

21.8. What is the difference, if any, between tests of emit roots and tests of cointegration? 

21.9. What is spurious regression? 

21.10. What is the connection between cointegration and spurious regression? 

21.11. What is the difference between a deterministic trend and a stochastic trend? 

21.12. What is meant by a trend-stationary process (TSP) and a difference-stationary 
process (DSP)? 

21.13. What is a random walk (model)? 

21.14. “For a random walk stochastic process, the variance is infinite.” Do you agree? 
Why? 

21.15. What is the error correction mechanism (ECM)? What is its relationship with 
cointegration? 

Empirical Exercises 

21.16. Using the U.S. economic time series data posted on the book’s website, obtain 
sample correlograms up to 36 lags for the time series LPCE, LDPI, LCP(profits), 
and LDIVIDENDS. What general pattern do you see? Intuitively, which one(s) of 
these time series seems to be stationary? 

21.17. For each of the time series of Exercise 21.16, use the DF test to find out if these 
series contain a unit root. If a unit root exists, how would you characterize such a 
time series? 
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21.18. Continue with Exercise 21.17. How would you decide if the ADF test is more 
appropriate than the DF test? 

21.19. Consider the dividends and profits time series given in the U.S. economic time 
series data posted on the book’s website. Since dividends depend on profits, con¬ 
sider the following simple model: 

LDIVIDENDS, = fa + &LCP + u, 

a. Would you expect this regression to suffer from the spurious regression 
phenomenon? Why? 

b. Are the logged Dividends and logged Profits time series cointegrated? How do 
you test for this explicitly? If, after testing, you find that they are cointegrated, 
would your answer in (a) change? 

c. Employ the error correction mechanism (ECM) to study the short- and long-run 
behavior of dividends in relation to profits. 

d. If you examine the LDIVIDENDS and LCP series individually, do they exhibit 
stochastic or deterministic trends? What tests do you use? 

*e. Assume LDIVIDENDS and LCP are cointegrated. Then, instead of regressing 
dividends on profits, you regress profits on dividends. Is such a regression valid? 

21.20. Take the first differences of the time series given in the U.S. economic time series 
data posted on the book’s website and plot them. Also obtain a correlogram of each 
time series up to 36 lags. What strikes you about these correlograms? 

21.21. Instead of regressing LDIVIDENDS on LCP in level form, suppose you regress the 
first difference of LDIVIDENDS on the first difference of LCP. Would you include the 
intercept in this regression? Why or why not? Show the calculations. 

21.22. Continue with the previous exercise. How would you test the first-difference re¬ 
gression for stationarity? In the present example, what would you expect a priori 
and why? Show all the calculations. 

21.23. From the U.K. private sector housing starts ( X) for the period 1948 to 1984, Terence 
Mills obtained the following regression results: 1 ' 

AX, — 31.03 - 0.188X f _! 

se = (12.50) (0.080) 

(Jjgr (-2.35) 

Note: The 5 percent critical r value is —2.95 and the 10 percent critical r value 
is -2.60. 

a. On the basis of these results, is the housing starts time series stationary or nonsta¬ 
tionary? Alternatively, is there a unit root in this time series? How do you know? 

b. If you were to use the usual t test, is the observed t value statistically significant? 
On this basis, would you have concluded that this time series is stationary? 

c. Now consider the following regression results: 

A^X, = 4.76 - 1.39AX,_ 1 + 0.313A 2 X,_ 1 
se = (5.06) (0.236) (0.163) 

(t=)r (-5.89) 

'Optional. 

^Terence C. Mills, op. cit., p. 127. Notation slightly altered. 



Chapter 21 Time Series Econometrics: Some Basic Concepts 771 


where A 2 is the second difference operator, that is, the first difference of the first 
difference. The estimated t value is now statistically significant. What can you say 
now about the stationarity of the time series in question? 

Note: The purpose of the preceding regression is to find out if there is a second 
unit root in the time series. 

21.24. Generate two random walk series as indicated in Eqs. (21.7.1) and (21.7.2) and 
regress one on the other. Repeat this exercise but now use their first differences and 
verify that in this regression the R 2 value is about zero and the Durbin-Watson d is 
close to 2. 

21.25. To show that two variables, each with deterministic trend, can lead to spurious 
regression, Charemza et al. obtained the following regression based on 30 
observations:* 

7 t = 5.92 + 0.030X, 
t = (9.9) (21.2) 

R 2 = 0.92 d = 0.06 

where Y x = 1, Y 2 = 2,..., Y„ = n andWi = l,X 2 = 4,. . . ,X„ = n 2 . 

a. What kind of trend does Y exhibit? and X? 

b. Plot the two variables and plot the regression line. What general conclusion do 
you draw from this plot? 

21.26. From the data for the period 1971-1 to 1988-IV for Canada, the following regres¬ 
sion results were obtained: 

1. htMl, = -10.2571 + 1.5975 In GDP, 

t= (-12,9422) (25.8865) 

R 2 = 0.9463 d= 0.3254 

2. AtoMT t = 0.0095 + 0.5 833 A In GDP, 

t= (2.4957) (1.8958) 

R 2 = 0.0885 d= 1.7399 

3. A it, = —0.1958m,_ i 

(t= r) (-2.2521) 

R 2 = 0.1118 d= 1.4767 

where Ml = Ml money supply, GDP = gross domestic product, both measured in 
billions of Canadian dollars, In is natural log, and u, represent the estimated resid¬ 
uals from regression (1). 

a. Interpret regressions (1) and (2). 

b. Do you suspect that regression (1) is spurious? Why? 

c. Is regression (2) spurious? How do you know? 

d. From the results of regression (3), would you change your conclusion in (£>)? 
Why? 


*Charemza et al., op. cit., p. 93. 
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e. Now consider the following regression: 

AhUVU,= 0.0084 + 0.7340A In GDP,- 0.0811 m,_ i 

t = (2.0496) (2.0636) (-0.8537) 

R 2 = 0.1066 d= 1.6697 

What does this regression tell you? Does this help you decide if regression (1) is 
spurious or not? 

21.27. The following regressions are based on the CPI data for the United States for the 
period 1960-2007, for a total of 48 annual observations: 

0.0334CPI,_i 

(12.37) 

R 2 = 0.0703 d = 0.3663 RSS = 206.65 
1.8662 + 0.0192CPI,_i 
(3.27) (3.86) 

R 2 = 0.249 d — 0.4462 RSS = 166.921 
1.1611 + 0.5344/ — 0.1077CPI,_i 

(2.37) (4.80) (-4.02) 

R 2 = 0.507 d m 0.6071 RSS = 109.608 

where RSS = residual sum of squares. 

a. Examining the preceding regressions, what can you say about stationarity of the 
CPI time series? 

b. How would you choose among the three models? 

c. Equation (1) is Eq. (3) minus the intercept and trend. Which test would you use 
to decide if the implied restrictions of model (1) are valid? (Hint: Use the 
Dickey-Fuller t and F tests. Use the approximate values given in Appendix D, 
Table D.7.) 

21.28. As noted in the text, there may be several structural breaks in the U.S. economic 
time series dataset introduced in Section 21.1. Dummy variables are a good way of 
incorporating these shifts in the data. 

a. Using dummy variables to designate three different periods related to the oil 
embargoes in 1973 and 1979, regress the log of personal consumption expendi¬ 
tures (LPCE) on the log of disposable personal income (LDPI). Has there been a 
change in the results? What is your decision about the unit root hypothesis now? 

b. Several websites list the official economic cycles that may have affected the 
U.S. economic time series data discussed in Section 21.1. See, for example, 
http://www.nber.org/cycles/cyclesmain.html. Using the information here, create 
dummy variables indicating some of the major cycles and check the results of 
regressing LPCE on LDPI. Has there been a change? 


1. ACPI, = 


2. ACPI, = 


3. ACPI, = 



Chapter 


Time Series 
Econometrics: 
Forecasting 

We noted in the Introduction that forecasting is an important part of econometric analysis, 
for some people probably the most important. How do we forecast economic variables, such 
as GDP, inflation, exchange rates, stock prices, unemployment rates, and myriad other eco¬ 
nomic variables? In this chapter we discuss two methods of forecasting that have become 
quite popular: (1) autoregressive integrated moving average (ARIMA), popularly known 
as the Box-Jenkins methodology, 1 and (2) vector autoregression (VAR). 

In this chapter we also discuss the special problems involved in forecasting prices of 
financial assets, such as stock prices and exchange rates. These asset prices are characterized 
by the phenomenon known as volatility clustering, that is, periods in which they exhibit 
wide swings for an extended time period followed by a period of comparative tranquility. 
One only has to look at the Dow Jones Index in the recent past. The so-called autoregressive 
conditional heteroscedasticity (ARCH) or generalized autoregressive conditional 
heteroscedasticity (GARCH) models can capture such volatility clustering. 

The topic of economic forecasting is vast, and specialized books have been written on 
this subject. Our objective in this chapter is to give the reader just a glimpse of this subject. 
The interested reader may consult the references for further study. Fortunately, most mod¬ 
em econometric packages have user-friendly introductions to several techniques discussed 
in this chapter. 

The linkage between this chapter and the previous chapter is that the forecasting 
methods discussed below assume that the underlying time series are stationary or they can 
be made stationary with appropriate transformations. As we progress through this chapter, 
you will see the use of the several concepts that we introduced in the last chapter. 

22.1 Approaches to Economic Forecasting 

Broadly speaking, there are five approaches to economic forecasting based on time 
series data: (1) exponential smoothing methods, (2) single-equation regression models, 
(3) simultaneous-equation regression models, (4) autoregressive integrated moving 
average (ARIMA) models, and (5) vector autoregression (VAR) models. 

'G. P. E. Box and G. M. Jenkins, Time Series Analysis: Forecasting and Control, revised ed., Holden Day, 
San Francisco, 1978. 
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Exponential Smoothing Methods 2 

These are essentially methods of fitting a suitable curve to historical data of a given time 
series. There are a variety of these methods, such as single exponential smoothing, Holt’s 
linear method, Holt-Winters’ method, and their variations. Although still used in several 
areas of business and economic forecasting, these are now supplemented (supplanted?) by 
the other four methods that follow. We will not discuss exponential smoothing methods in 
this chapter, for that would take us far afield. 

Single-Equation Regression Models 

The bulk of this book has been devoted to single-equation regression models. As an example 
of a single-equation model, consider the demand function for automobiles. On the basis of 
economic theory, we postulate that the demand for automobiles is a function of automobile 
prices, advertising expenditure, income of the consumer, interest rate (as a measure of the cost 
of borrowing), and other relevant variables (e.g., family size, travel distance to work). From 
time series data, we estimate an appropriate model of auto demand (either linear, log-linear, 
or nonlinear), which can be used for forecasting demand for autos in the future. Of course, as 
noted in Chapter 5, forecasting errors increase rapidly if we go too far out in the future. 

Simultaneous-Equation Regression Models 3 

In Chapters 18, 19, and 20 we considered simultaneous-equation models. In their heyday 
during the 1960s and 1970s, elaborate models of the U.S. economy based on simultaneous 
equations dominated economic forecasting. But since then the glamor of such forecasting 
models has subsided because of their poor forecasting performance, especially since the 
1973 and 1979 oil price shocks (due to OPEC oil embargoes) and also because of the so- 
called Lucas critique. 4 The thrust of this critique, as you may recall, is that the parameters 
estimated from an econometric model are dependent on the policy prevailing at the time the 
model was estimated and will change if there is a policy change. In short, the estimated 
parameters are not invariant in the presence of policy changes. 

For example, in October 1979 the Fed changed its monetary policy dramatically. Instead 
of targeting interest rates, it announced it would henceforth monitor the rate of growth of 
the money supply. With such a pronounced change, an econometric model estimated from 
past data will have little forecasting value in the new regime. These days the Fed’s empha¬ 
sis has changed from controlling the money supply to controlling the short-term interest 
rate (the federal funds rate). 

ARIMA Models 

The publication by Box and Jenkins of Time Series Analysis: Forecasting and Control 
(op. cit.) ushered in a new generation of forecasting tools. Popularly known as the 
Box-Jenkins (B J) methodology, but technically known as the ARIMA methodology, the em¬ 
phasis of these methods is not on constructing single-equation or simultaneous-equation 
models but on analyzing the probabilistic, or stochastic, properties of economic time series 

2 For a comparatively simple exposition of these methods, see Spyros Makridakis, Steven C. 
Wheelwright, and Rob J. Hyndman, Forecasting Methods and Applications, 3d ed., John Wiley Sr 
Sons, New York, 1998. 

3 For a textbook treatment of the use of simultaneous-equation models in forecasting, see Robert S. 
Pindyck and Daniel L. Rubinfeld, Econometric Models & Economic Forecasts, 4th ed., McGraw-Hill, 

New York, 1998, Part III. 

4 Robert E. Lucas, "Econometric Policy Evaluation: A Critique," in Carnegie-Rochester Conference 
Series, The Phillips Curve, North-Holland, Amsterdam, 1976, pp. 19-46. This article, among others, 
earned Lucas a Nobel Prize in economics. 
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on their own under the philosophy let the data speak for themselves. Unlike the regres¬ 
sion models, in which Y, is explained by k regressors X\, X 3 , X 3 ,... , X^, the BJ-type time 
series models allow Y t to he explained by past, or lagged, values of Y itself and stochastic 
error terms. For this reason, ARIMA models are sometimes called atheoretic models 
because they are not derived from any economic theory—and economic theories are often 
the basis of simultaneous-equation models. 

In passing, note that our emphasis in this chapter is on univariate ARIMA models, that 
is, ARIMA models pertaining to a single time series. But the analysis can be extended to 
multivariate ARIMA models. 

VAR Models 

VAR methodology superficially resembles simultaneous-equation modeling in that we con¬ 
sider several endogenous variables together. But each endogenous variable is explained by 
its lagged, or past, values and the lagged values of all other endogenous variables in the 
model; usually, there are no exogenous variables in the model. 

In the rest of this chapter we discuss the fundamentals of Box-Jenkins and VAR 
approaches to economic forecasting. Our discussion is elementary and heuristic. The 
reader wishing to pursue this subject further is advised to consult the references. 5 

22.2 AR, MA, and ARIMA Modeling of Time Series Data 

To introduce several ideas, some old and some new, let us work with the GDP time series 
data for the United States introduced in Section 21.1 (see the book’s website for the actual 
data). A plot of this time series is already given in Figures 21.1 (undifferenced logged GDP) 
and 21.9 (first-differenced LGDP); recall that LGDP in level form is nonstationary but in 
the (first) differenced form it is stationary. 

If a time series is stationary, we can model it in a variety of ways. 

An Autoregressive (AR) Process 

Let Y, represent the logged GDP at time t. If we model Y, as 

(Yt - 8) = at (y f _! - 8) + u t (22.2.1) 

where 5 is the mean of Y and where u t is an uncorrelated random error term with zero mean 
and constant variance a 2 (i.e., it is white noise), then we say that Y, follows a first-order 
autoregressive, or AR(1), stochastic process, which we have already encountered in 
Chapter 12. Here the value of Y at time t depends on its value in the previous time period 
and a random term; the Y values are expressed as deviations from their mean value. In other 
words, this model says that the forecast value of Y at time t is simply some proportion 
(= ai) of its value at time (t — 1) plus a random shock or disturbance at time t; again the 
Y values are expressed around their mean values. 

But if we consider this model, 

(Y t -8) = «i(y,_i - 8) + a 3 (Y t _ 2 — 8) + u t (22.2.2) 


5 See Pindyck and Rubinfeld, op. cit., Part 3; Alan Pankratz, Forecasting with Dynamic Regression 
Models, John Wiley & Sons, New York, 1991 (this is an applied book); and Andrew Harvey, 

The Econometric Analysis of Time Series, The MIT Press, 2d ed., Cambridge, Mass., 1990 (this is a rather 
advanced book). A thorough but accessible discussion can also be found in Terence C. Mills, Time 
Series Techniques for Economists, Cambridge University Press, New York, 1990. 
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then we say that Y t follows a second-order autoregressive, or AR(2), process. That is, the 
value of Y at time t depends on its value in the previous two time periods, the Y values being 
expressed around their mean value 8. 

In general, we can have 

(Y t -8) = - 8) + oc 2 (Y t _ 2 ~ 8) + ■ ■ ■ + a p (Y t _ p -8) + u, (22.2.3) 

in which case Y t is a pth-order autoregressive, or AR(/j), process. 

Notice that in all the preceding models only the current and previous Y values are 
involved; there are no other regressors. In this sense, we say that the “data speak for them¬ 
selves.” They are a kind of reduced form model that we encountered in our discussion of the 
simultaneous-equation models. 

A Moving Average (MA) Process 

The AR process just discussed is not the only mechanism that may have generated Y. 
Suppose we model Y as follows: 

Y t =ft + p 0 u, + Piu t -.i (22.2.4) 

where p, is a constant and u, as before, is the white noise stochastic error term. Here Y at time 
t is equal to a constant plus a moving average of the current and past error terms. Thus, in 
the present case, we say that Y follows a first-order moving average, or an MA(1), process. 
But if Y follows the expression 

Y t = n + Po Ut + PiUt-i + PiUt-2 (22.2.5) 

then it is an MA(2) process. More generally, 

Y t = fi + Pou t + P\u t -\ + PiUt-2 + ■ ■ • + PqUt—q (22.2.6) 

is an MA(<jr) process. In short, a moving average process is simply a linear combination of 
white noise error terms. 

An Autoregressive and Moving Average (ARMA) Process 

Of course, it is quite likely that Y has characteristics of both AR and MA and is therefore 
ARMA. Thus, Y, follows an ARMA(1,1) process if it can be written as 

Y t = 0+aiY t -i+PoUt+Piu t -i (22.2.7) 

because there is one autoregressive and one moving average term. In Eq. (22.2.7) 9 repre¬ 
sents a constant term. 

In general, in an ARMA(/>, q) process, there will be p autoregressive and q moving 
average terms. 

An Autoregressive Integrated Moving Average (ARIMA) Process 

The time series models we have already discussed are based on the assumption that the 
time series involved are (weakly) stationary in the sense defined in Chapter 21. Briefly, the 
mean and variance for a weakly stationary time series are constant and its covariance is 
time-invariant. But we know that many economic time series are nonstationary, that is, they 
are integrated; for example, the economic time series introduced in Section 21.1 of Chap¬ 
ter 21 are integrated. 

But we also saw in Chapter 21 that if a time series is integrated of order 1 (i.e., it is 7[ 1 ]), 
its first differences are 7(0), that is, stationary. Similarly, if a time series is 7(2), its second 
difference is 7(0). In general, if a time series is 1(d), after differencing it d times we obtain 
an 7(0) series. 
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Therefore, if we have to difference a time series d times to make it stationary and then 
apply the ARMA(/j, q) model to it, we say that the original time series is ARIMA(p, d, q), 
that is, it is an autoregressive integrated moving average time series, where p denotes the 
number of autoregressive terms, d the number of times the series has to be differenced 
before it becomes stationary, and q the number of moving average terms. Thus, an 
ARIMA(2,1,2) time series has to be differenced once (d — 1) before it becomes stationary 
and the (first-differenced) stationary time series can be modeled as an ARMA(2,2) process, 
that is, it has two AR and two MA terms. Of course, if d = 0 (i.e., a series is stationary to 
begin with), ARIMA(p, d=0,q) = ARMA(p, q). Note that an ARIMA(/;, 0, 0) process 
means a purely AR(p) stationary process; an ARIMA(0, 0, q) means a purely MA(g) sta¬ 
tionary process. Given the values of p, d, and q, one can tell what process is being modeled. 

The important point to note is that to use the Box-Jenkins methodology, we must have 
either a stationary time series or a time series that is stationary after one or more differenc- 
ings. The reason for assuming stationarity can be explained as follows: 

The objective of B-J [Box-Jenkins] is to identify and estimate a statistical model which can be 
interpreted as having generated the sample data. If this estimated model is then to be used for 
forecasting we must assume that the features of this model are constant through time, and par¬ 
ticularly over future time periods. Thus the simple reason for requiring stationary data is that 
any model which is inferred from these data can itself be interpreted as stationary or stable, 
therefore providing [a] valid basis for forecasting. 6 


22.3 The Box-Jenkins (BJ) Methodology 

The million-dollar question obviously is: Looking at a time series, such as the U.S. LGDP 
series in Figure 21.1, how does one know whether it follows a purely AR process (and if so, 
what is the value of p) or a purely MA process (and if so, what is the value of q) or an 
ARMA process (and if so, what are the values of p and q) or an ARIMA process, in which 
case we must know the values of p, d, and q. The BJ methodology comes in handy in 
answering the preceding question. The method consists of four steps: 

Step 1. Identification. That is, find out the appropriate values of p, d, and q. We will 
show shortly how the correlogram and partial correlogram aid in this task. 

Step 2. Estimation. Having identified the appropriate p and q values, the next stage is 
to estimate the parameters of the autoregressive and moving average terms included 
in the model. Sometimes this calculation can be done by simple least squares but 
sometimes we will have to resort to nonlinear (in parameter) estimation methods. 

Since this task is now routinely handled by several statistical packages, we do not have 
to worry about the actual mathematics of estimation; the enterprising student may 
consult the references on that. 

Step 3. Diagnostic checking. Having chosen a particular ARIMA model, and having 
estimated its parameters, we next see whether the chosen model fits the data reason¬ 
ably well, for it is possible that another ARIMA model might do the job as well. This is 
why Box-Jenkins ARIMA modeling is more an art than a science; considerable skill is 
required to choose the right ARIMA model. One simple test of the chosen model is to 
see if the residuals estimated from this model are white noise; if they are, we can 
accept the particular fit; if not, we must start over. Thus, the BJ methodology is an 
iterative process (see Figure 22.1). 


6 Michael Pokorny, An Introduction to Econometrics, Basil Blackwell, New York, 1987, p. 343. 
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FIGURE 22.1 

The Box-Jenkins 
methodology. 


1. Identification of the model 
(Choosing tentative p, d, q) 


2. Parameter estimation of 
the chosen model 


3. Diagnostic checking: 

Are the estimated residuals white noise? 


Yes |- 1 - 1 No 

(Go to Step 4) 1 1 (Return to Step 1) 

4. Forecasting 


Step 4. Forecasting. One of the reasons for the popularity of the ARIMA modeling 
is its success in forecasting. In many cases, the forecasts obtained by this method are 
more reliable than those obtained from the traditional econometric modeling, 
particularly for short-term forecasts. Of course, each case must be checked. 

With this general discussion, let us look at these four steps in some detail. Throughout, 
we will use the GDP data introduced in Section 21.1 (see the book’s website for the actual 
data) to illustrate the various points. 

22.4 Identification 


The chief tools in identification are the autocorrelation function (ACF), the partial 
autocorrelation function (PACF), and the resulting correlograms, which are simply the 
plots of ACFs and PACFs against the lag length. 

In the previous chapter we defined the (population) ACF (pk) and the sample ACF (Pk)- 
The concept of partial autocorrelation is analogous to the concept of partial regression 
coefficient. In the ^-variable multiple regression model, the Ath regression coefficient fit 
measures the rate of change in the mean value of the regressand for a unit change in the Ath 
regressor Xk, holding the influence of all other regressors constant. 

In similar fashion, the partial autocorrelation pkk measures correlation between (time 
series) observations that are k time periods apart after controlling for correlations at inter¬ 
mediate lags (i.e., lags less than k). In other words, partial autocorrelation is the correlation 
between Y, and Y t -k after removing the effect of the intermediate F’s. 7 In Section 7.11 we 
already introduced the concept of partial correlation in the regression context and showed 
its relation to simple correlations. Such partial correlations are now routinely computed by 
most statistical packages. 

In Figure 22.2 we show the correlogram (panel a) and partial correlogram (panel b ) of 
the LGDP series. From this figure, two facts stand out: First, the ACF declines very slowly; 
as shown in Figure 21.8, ACF up to about 22 lags are individually statistically significantly 
different from zero, for they all are outside the 95 percent confidence bounds. Second, after 
the second lag, the PACF drops dramatically, and most PACFs after lag 2 are statistically 
insignificant, save for maybe lag 13. 

7 ln time series data a large proportion of correlation between Y t and Y t -k may be due to the 
correlations they have with the intervening lags Y t -i, Y t -2, ■ ■■, Y t ~k+ 1 • The partial correlation pkk 
removes the influence of these intervening variables. 
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FIGURE 22.2 

(a) Correlogram and 

( b ) partial 
correlogram, for 
LGDP, United States, 
1947-1 to 2007-IV 
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Bartlett’s formula for MA(g) 95% confidence bands. 

(a) 



Lag 

95% Confidence bands [se = l/sqrt(n)]. 

(b) 


Since the U.S. LGDP time series is not stationary, we have to make it stationary before we 
can apply the Box-Jenkins methodology. In Figure 21.9 we plotted the first differences of 
LGDP. Unlike Figure 21.1, we do not observe any trend in this series, perhaps suggesting 
that the first-differenced LGDP time series is stationary. 8 A formal application of the 
Dickey-Fuller unit root test shows that that is indeed the case. We can also see this visually 
from the estimated ACF and PACF correlograms given in panels (a) and (b) of Figure 22.3. 
Now we have a much different pattern of ACF and PACF. The ACFs at lags 1, 2, and 5 seem 
statistically different from zero; recall from Chapter 21 that the approximate 95 percent con¬ 
fidence limits for pk are —0.1254 and +0.1254. {Note: As discussed in Chapter 21, these 

8 lt is hard to tell whether the variance of this series is stationary, especially around 1979-1980. The oil 
embargo of 1979 and a significant change in the Fed's monetary policy in 1979 may have something 
to do with our difficulty. 
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FIGURE 22.3 

(a) Correlogram and 

( b ) partial correlogram 
for first differences of 
LGDP, United States, 
1947-1 to 2007-IV 



Bartlett’s formula for MAO?) 95% confidence bands. 

00 



Lag 

95% Confidence bands [se = l/sqrt(n)]. 

(b) 


confidence limits are asymptotic and so can be considered approximate.) But at all other 
lags, they are not statistically different from zero. For the partial autocorrelations, only lags 1 
and 12 seem to be statistically different from zero. 

Now how do the correlograms given in Figure 22.3 enable us to find the ARMA pattern 
of the LGDP time series? {Note: We will consider only the first-differenced LGDP series be¬ 
cause it is stationary.) One way of accomplishing this is to consider the ACF and PACF and 
the associated correlograms of a selected number of ARMA processes, such as AR(1), 
AR(2), MA(1), MA(2), ARMA(1, 1), ARIMA(2, 2), and so on. Since each of these stochas¬ 
tic processes exhibits typical patterns of ACF and PACF, if the time series under study fits one 
of these patterns we can identify the time series with that process. Of course, we will have to 
apply diagnostic tests to find out if the chosen ARMA model is reasonably accurate. 

To study the properties of the various standard ARIMA processes would consume a lot 
of space. What we plan to do is to give general guidelines (see Table 22.1); the references 
can give the details of the various stochastic processes. 
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TABLE 22.1 
Theoretical Patterns 
of ACF and PACF 


Type of Model 

AR(p) 

MA(g) 

ARMA(p, q) 


Typical Pattern of ACF Typical Pattern of PACF 

Decays exponentially or with Significant spikes through 

damped sine wave pattern or both lags p 
Significant spikes through lags q Declines exponentially 

Exponential decay Exponential decay 


Note: The terms exponential and geometric decay mean the same things (recall our discussion of the Koyck distributed lag). 


FIGURE 22.4 ACF and PACF of selected stochastic processes: (a) AR(2): a\ = 0.5, a 2 = 0.3; (b) MA(2): fi\ = 0.5, 
fa = 0.3; (c) ARMA(1, 1): ai = 0.5, fa = 0.5. 


J I I I L 


Pk Pkk 



(c) 

Notice that the ACFs and PACFs of AR(/j) and MA(g) processes have opposite patterns; 
in the AR(» case the AC declines geometrically or exponentially but the PACF cuts off 
after a certain number of lags, whereas the opposite happens to an MA(g) process. 
Geometrically, these patterns are shown in Figure 22.4. 

A Warning 

Since in practice we do not observe the theoretical ACFs and PACFs and rely on their sam¬ 
ple counterparts, the estimated ACFs and PACFs will not match exactly their theoretical 
counterparts. What we are looking for is the resemblance between theoretical and sample 
ACFs and PACFs so that they can point us in the right direction in constructing ARIMA 
models. And that is why ARIMA modeling requires a great deal of skill, which of course 
comes from practice. 

ARIMA Identification ofU.S. GDP 

Returning to the correlogram and partial correlogram of the stationary (after first- 
differencing) U.S. LGDP for 1947-1 to 2007-IV given in Figure 22.3, what do we see? 

Remembering that the ACF and PACF are sample quantities, we do not have a nice pat¬ 
tern as suggested in Table 22.1. The autocorrelations (panel a) decline for the first two lags 
and then, with the exception of lag 5, the rest of them are not statistically different from 
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zero (the gray area shown in the figures gives the approximate 95 percent confidence lim¬ 
its). The partial autocorrelations (panel b) with spikes at lags 1 and 12 seem statistically 
significant but the rest are not; if the partial correlation coefficient were significant only at 
lag 1, we could have identified this as an AR(1) model. Let us therefore assume that the 
process that generated the (first-differenced) LGDP series is an MA(2) process. Keep 
in mind that unless the ACF and PACF are not well-defined, it is hard to choose a model 
without trial and error. The reader is encouraged to try other ARIMA models on the first- 
differenced LGDP series. 

22.5 Estimation of the ARIMA Model 


Let Y* denote the first differences of U.S. logged GDP. Then our tentatively identified MA 
model is 

Y* — fJ* + (22.5.1) 

Using MINITAB, we obtained the following estimates: 

Y* = 0.00822 + 0.2918w,_i + 0.2024 m ( _ 2 
se = (0.00088) (0.0633) (0.0634) 

(22.5.2) 

t = (9.32) (4.61) (3.20) v ' 

R 2 = 0.1217 d= 1.9705 

We leave it as an exercise for the reader to estimate other ARIMA models for the first- 
differenced LGDP series. 

22.6 Diagnostic Checking 

How do we know that the model in Eq. (22.5.2) is a reasonable fit to the data? One simple 
diagnostic is to obtain residuals from Eq. (22.5.2) and obtain the ACF and PACF of these resid¬ 
uals, say, up to lag 25. The estimated AC and PACF are shown in Figure 22.5. As this figure 
shows, none of the autocorrelations (panel a) and partial autocorrelations (panel b) are indi¬ 
vidually statistically significant. Nor is the sum of the 25 squared autocorrelations, as shown 
by the Box-Pierce Q and Ljung-Box (LB) statistics (see Chapter 21), statistically signifi¬ 
cant. In other words, the correlograms of both autocorrelation and partial autocorrelation 
give the impression that the residuals estimated from Eq. (22.5.2) are purely random. Hence, 
there may not be any need to look for another ARIMA model. 

22.7 Forecasting 


Remember that the GDP data are for the period 1974—1 to 2007-IV Suppose, on the basis of 
model (22.5.2), we want to forecast LGDP for the first four quarters of 2008. But in 
Eq. (22.5.2) the dependent variable is change in the LGDP over the previous quarter. There¬ 
fore, if we use Eq. (22.5.2), what we can obtain are the forecasts of LGDP changes between 
the first quarter of 2008 and the fourth quarter of 2007, the second quarter of 2008 over the 
first quarter of 2008, etc. 

To obtain the forecast of LGDP level rather than its changes, we can “undo” the first- 
difference transformation that we had used to obtain the changes. (More technically, we 
integrate the first-differenced series.) Thus, to obtain the forecast value of LGDP (not 
ALGDP) for 2008-1, we rewrite model (22.5.1) as 

F2008-I - ^2007-IV = 11 + /h«2007-TV + ft«2007—III + M2008-1 ( 22 . 7 . 1 ) 
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FIGURE 22.5 

(a) Correlogram and 

( b ) partial correlogram 
for residuals of MA(2) 
model for the first 
differences of LGDP, 
United States, 1947-1 
to 2007-IV 



Lag 

Bartlett’s formula for MAfg) 95% confidence bands. 

(a) 



95% Confidence bands [se = l/sqrt(n)]. 

(b) 


*2008—1 = M + ft « 2007-TV + ft M 2007-III + «2008-I + *2007-IV (22.7.2) 

The values of ft, ft, and ft are already known from the estimated regression (22.5.2). The 
value of «2008-i is assumed to be zero (why?). Therefore, we can easily obtain the forecast 
value of T2008—i- The numerical estimate of this forecast value is: 9 

*2008—1 = 0.00822 + (0.2918)w 2 oo7-rv + ( 0 . 2024 )(m 2 oo 7 -iii) + *2007-iv 
= 0.00822 + (0.2918)(0.00853) + (0.2024)(-0.00399) + 9.3653 
= 9.3741(approx.) 

Although standard computer packages do this computation routinely, we show the detailed 
calculations to illustrate the mechanics involved. 
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Thus the forecast value of LGDP for 2008-1 is about 9.3741, which is about $11,779 billion 
(2000 dollars). Incidentally, the actual value of real GDP for 2008-1 was $ 11,693.09 billion; 
the forecast error was an overestimate of $86 billion. 

22.8 Further Aspects of the BJ Methodology 

In the preceding paragraphs we have provided but a sketchy introduction to the BJ model¬ 
ing. There are many aspects of this methodology that we have not considered for lack of 
space, for example, seasonality. Many time series exhibit seasonal behavior. Examples are 
sales by department stores in conjunction with major holidays, seasonal consumption of ice 
cream, travels during public holidays, etc. If, for example, we had data on department 
stores sales by quarters, the sales figures would show spikes in the fourth quarter. In such 
situations, one can remove the seasonal influence by taking fourth-quarter differences of 
the sales figures and then decide what kind of ARIMA model to fit. 

We have analyzed only a single time series at a time. But nothing prevents the BJ 
methodology from being extended to the simultaneous study of two or more time series. 
A foray into that topic would take us far afield. The interested reader may want to consult 
the references. 10 In the following section, however, we discuss this topic in the context of 
what is known as vector autoregression. 

22.9 Vector Autoregression (VAR) 

In Chapters 18 to 20 we considered simultaneous, or structural, equation models. In such 
models some variables are treated as endogenous and some as exogenous or predetermined 
(exogenous plus lagged endogenous). Before we estimate such models, we have to make 
sure that the equations in the system are identified (either exactly or over-). This identifica¬ 
tion is often achieved by assuming that some of the predetermined variables are present 
only in some equations. This decision is often subjective and has been severely criticized 
by Christopher Sims. 11 

According to Sims, if there is true simultaneity among a set of variables, they should all be 
treated on an equal footing; there should not be any a priori distinction between endogenous 
and exogenous variables. It is in this spirit that Sims developed his VAR model. 

The seeds of this model were already sown in the Granger causality test discussed in 
Chapter 17. In Eqs. (17.14.1) and (17.14.2), which explain current LGDP in terms of 
lagged money supply and lagged LGDP and current money supply in terms of lagged 
money supply and lagged LGDP, respectively, we are essentially treating LGDP and money 
supply as a pair of endogenous variables. There are no exogenous variables in this system. 

Similarly, in Example 17.13 we examined the nature of causality between money and 
interest rate in Canada. In the money equation, only the lagged values of money and inter¬ 
est rate appear, and in the interest rate equation only the lagged values of interest rate and 
money appear. 

Both these examples are illustrations of vector autoregressive models; the term 
autoregressive is due to the appearance of the lagged value of the dependent variable on the 
right-hand side and the term vector is due to the fact that we are dealing with a vector of 
two (or more) variables. 


10 For an accessible treatment of this subject, see Terence C. Mills, op. cit.. Part III. 
n C. A. Sims, "Macroeconomics and Reality," Econometrica, vol. 48, 1980, pp. 1 -48. 
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Estimation or VAR 

Returning to the Canadian money-interest rate example, we saw that when we introduced 
six lags of each variable as regressors, we could not reject the hypothesis that there was 
bilateral causality between money (Mi) and interest rate, R (90-day corporate interest rate). 
That is, Mi affects R and R affects Mi. These kinds of situations are ideally suited for the 
application of VAR. 

To explain how a VAR is estimated, we will continue with the preceding example. For 
now we assume that each equation contains k lag values of M(as measured by Mi) and R. 
In this case, one can estimate each of the following equations by OLS. 12 

M lt =a + J2 PjMt-j + £ Vj R t~j + «i r (22.9.1) 

R t = a' + J2 e J M t-§ + J2 yj R ‘~j + 1121 (22.9.2) 

where the u’s are the stochastic error terms, called impulses or innovations or shocks in 
the language of VAR. 

Before we estimate Eqs. (22.9.1) and (22.9.2) we have to decide on the maximum lag 
length, k. This is an empirical question. We have 40 observations in all. Including too many 
lagged terms will consume degrees of freedom, not to mention introducing the possibility 
of multicollinearity. Including too few lags will lead to specification errors. One way of 
deciding this question is to use a criterion like the Akaike or Schwarz and choose that 
model that gives the lowest values of these criteria. There is no question that some trial and 
error is inevitable. 

To illustrate the mechanics, we initially used four lags (k = 4) of each variable and using 
EViews 6 we obtained the estimates of the parameters of the preceding two equations, which 
are given in Table 22.2. Note that although our sample runs from 1979-1 to 1988-1V we 
used the sample for the period 1980-1 to 1987-IV and saved the last four observations to 
check the forecasting accuracy of the fitted VAR. 

Since the preceding equations are OLS regressions, the output of the regression given in 
Table 22.2 is to be interpreted in the usual fashion. Of course, with several lags of the same 
variables, each estimated coefficient will not be statistically significant, possibly because 
of multicollinearity. But collectively, they may be significant on the basis of the standard 
F test. 

Let us examine the results presented in Table 22.2. First consider the M\ regression. Indi¬ 
vidually, only Mi at lag 1 and R at lags 1 and 2 are statistically significant. But the F value is 
so high that we cannot reject the hypothesis that collectively all the lagged terms are statisti¬ 
cally significant. Turning to the interest rate regression, we see that all of the four lagged 
money terms are individually statistically significant (at the 10 percent or better level), 
whereas only the 1-period lagged interest rate variable is significant. 

For comparative purposes, we present in Table 22.3 the VAR results based on only 2 lags 
of each endogenous variable. Here you will see that in the money regression the 1-period- 
lagged money variable and both lagged interest rate terms are individually statistically sig¬ 
nificant. In the interest rate regression, both lagged money terms (at about the 5 percent 
level) and one lagged interest term are individually significant. 

12 One can use the SURE (seemingly unrelated regression) technique to estimate the two equations 
together. However, since each regression contains the same number of lagged endogenous variables, 
the OLS estimation of each equation separately produces identical (and efficient) estimates. 
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TABLE 22.2 

Vector Autoregression 
Estimates Based on 
4 Lags 


Sample (adjusted): 1980-1 to 1987-IV 

Included observations: 32 after adjusting endpoints 

Standard errors in ( ) and t statistics in [ ] 



M-t 

R 

Mi (-1) 

1.076737 (0.20174) [5.33733] 

0.001282 (0.00067) [1.90083] 

Mi (-2) 

0.173433 (0.31444) [0.55157] 

-0.002140 (0.00105) [-2.03584] 

Mi (-3) 

-0.366465 (0.34687) [-1.05648] 

0.0021 76 (0.00116) [1.87699] 

Mi (-4) 

0.077602 (0.20789) [0.37329] 

-0.001479 (0.00069) [-2.12855] 

«(-1) 

-275.0293 (57.2174) [-4.80675] 

1.139310(0.19127) [5.95670] 

R(-2) 

227.1 750 (95.3947) [2.38142] 

-0.309053 (0.31888) [-0.96917] 

H- 3) 

8.511851 (96.9176) [0.08783] 

0.052361 (0.32397) [0.16162] 

R (—4) 

-50.19926 (64.7554) [-0.77521] 

0.001076 (0.21646) [0.00497] 

C 

2413.827(1622.65) [1.48759] 

4.919000 (5.42416) [0.90687] 


R 2 

0.988154 

0.852890 

Adj. R 2 

0.984034 

0.801721 

Sum square residuals 

4820241. 

53.86233 

SE equation 

457.7944 

1.530307 

F statistic 

239.8315 

16.66815 

Log likelihood 

-236.1676 

-53.73716 

Akaike A/C 

15.32298 

3.921073 

Schwarz SC 

15.73521 

4.333311 

Mean dependent 

28514.53 

11.67292 

SD dependent 

3623.058 

3.436688 


Determinant residual covariance 490782.3 
Log likelihood (df adjusted) -300.4722 

Akaike information criterion 19.90451 

Schwarz criterion 20.72899 


If we have to make a choice between the model given in Table 22.2 and that given in 
Table 22.3, which would we choose? The Akaike and Schwarz information values for the 
model in Table 22.2 are, respectively, 15.32 and 15.73, whereas the corresponding values for 
Table 22.3 are 15.10and 15.33. Since the lower the values of Akaike and Schwarz statistics, 
the better the model, on that basis it seems the more parsimonious model given in Table 22.3 
is preferable. We also considered 6 lags of each of the endogenous variables and found that 
the values of Akaike and Schwarz statistics were 15.37 and 15.98, respectively. Again, the 
choice seems to be the model with two lagged terms of each endogenous variable, that is, the 
model in Table 22.3. 

Forecasting with VAR 

Suppose we choose the model given in Table 22.3. We can use it for the purpose of fore¬ 
casting the values of M\ and R. Remember that our data covers the period 1979-1 to 
1988-1V but we have not used the values for 1988 in estimating the VAR models. Now sup¬ 
pose we want to forecast the value of M\ for 1988-1, that is, the first quarter of 1988. The 
forecast value for 1988-1 can be obtained as follows: 

M1988-1 = 1451.977 + 1.0375Mi987_iv — 0.0446/Vf1987— tit 
— 234.8850i?i987-iv + 160.15607?i987-m 
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TABLE 22.3 

Vector Autoregression 
Estimates Based on 
2 Lags 


Sample (adjusted): 1979-111 to 1987—IV 
Included observations: 34 after adjusting endpoints 
Standard errors in ( ) and t statistics in [ ] 


M-i 

Mi (-1) 1.037537 (0.16048) [6.46509] 

Mi (-2) -0.044661 (0.15591) [-0.28646] 

R (-1) -234.8850 (45.5224) [-5.15977] 

R (-2) 160.1560 (48.5283) [3.30026] 

C 1451.977 (1185.59) [1.22468] 


0.001091 (0.00059) [1.85825] 
-0.001255 (0.00057) [-2.19871] 
1.069081 (0.16660) [6.41708] 
-0.223364 (0.17760) [-1.25768] 
5.796434(4.33894) [1.33591] 


R 2 

0.988198 

0.806660 

Adj. R 2 

0.986571 

0.779993 

Sum square residuals 

5373510. 

71.97054 

SE equation 

430.4573 

1.575355 

F statistic 

607.0720 

30.24878 

Log likelihood 

-251.7446 

-60.99215 

Akaike A/C 

15.10263 

3.881891 

Schwarz SC 

15.32709 

4.106356 

Mean dependent 

28216.26 

11.75049 

SD dependent 

3714.506 

3.358613 

Determinant residual covariance 

458485.4 


Log likelihood (df adjusted) 

-318.0944 


Akaike information criterion 

19.29967 


Schwarz criterion 

19.74860 



where the coefficient values are obtained from Table 22.3. Now using the appropriate val¬ 
ues of Mi and R from Table 17.5, the forecast value of money for the first quarter of 1988 
can be seen to be 36,996 (millions of Canadian dollars). The actual value of M\ for 1988-1 
was 36,480, which means that our model overpredicted the actual value by about 516 
(millions of dollars), which is about 1.4 percent of the actual M\ for 1988-1. Of course, 
these estimates will change, depending on how many lagged values we consider in the VAR 
model. It is left as an exercise for the reader to forecast the value of R for the first quarter 
of 1988 and compare it with its actual value for that quarter. 


VAR and Causality 

You may recall that we discussed the topic of causality in Chapter 17. There we considered 
the Granger and Sims tests of causality. Is there any connection between VAR and causal¬ 
ity? In Chapter 17 (Section 17.14) we saw that up to 2, 4, and 6 lags there was bilateral 
causality between M\ and R, but at lag 8 there was no causality between the two variables. 
Thus, the results are mixed. Now you may recall from Chapter 21 the Granger representa¬ 
tion theorem. One of the implications of this theorem is that if two variables, say, X t and 
Y, are cointegrated and each is individually 7(1), that is, integrated of order 1 (i.e., each 
is individually nonstationary), then either X, must Granger-cause Y t or Y t must Granger- 
cause X t . 

In our illustrative example this means if M\ and R are individually 7(1), but are cointe¬ 
grated, then either M\ must Granger-cause R or R must Granger-cause M \. This means we 
must first find out if the two variables are 7(1) individually and then find out if they are 
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cointegrated. If this is not the case, then the whole question of causality may become 
moot. In Exercise 22.22, the reader is asked to find out if the two variables are nonsta¬ 
tionary but are cointegrated. If you do the exercise, you will find that there is some weak 
evidence of cointegration between M\ and R, which is why the causality tests discussed in 
Section 17.14 were equivocal. 


Some Problems with VAR Modeling 

The advocates of VAR emphasize these virtues of the method: (1) The method is simple; 
one does not have to worry about determining which variables are endogenous and which 
ones are exogenous. All variables in VAR are endogenous. 13 (2) Estimation is simple; that 
is, the usual OLS method can be applied to each equation separately. (3) The forecasts 
obtained by this method are in many cases better than those obtained from the more com¬ 
plex simultaneous-equation models. 14 

But the critics of VAR modeling point out the following problems: 

1. Unlike simultaneous-equation models, a VAR model is a-theoretic because it uses 
less prior information. Recall that in simultaneous-equation models exclusion or inclusion 
of certain variables plays a crucial role in the identification of the model. 

2. Because of its emphasis on forecasting, VAR models are less suited for policy 
analysis. 

3. The biggest practical challenge in VAR modeling is to choose the appropriate lag 
length. Suppose you have a three-variable VAR model and you decide to include eight lags 
of each variable in each equation. You will have 24 lagged parameters in each equation plus 
the constant term, for a total of 25 parameters. Unless the sample size is large, estimating 
that many parameters will consume a lot of degrees of freedom with all the problems asso¬ 
ciated with that. 15 

4. Strictly speaking, in an m-variable VAR model, all the m variables should be (jointly) 
stationary. If that is not the case, we will have to transform the data appropriately (e.g., by 
first-differencing). As Harvey notes, the results from the transformed data may be unsatis¬ 
factory. He further notes that “The usual approach adopted by VAR aficionados is therefore 
to work in levels, even if some of these series are nonstationary. In this case, it is important to 
recognize the effect of unit roots on the distribution of estimators.” 16 Worse yet, if the model 
contains a mix of 7(0) and 7(1) variables, that is, a mix of stationary and nonstationary 
variables, transforming the data will not be easy. 

However, Cuthbertson argues that, “... cointegration analysis indicates that a VAR solely 
in first differences is misspecified, if there are some cointegrating vectors present among the 
7(1) series. Put another way, a VAR solely in first differences omits potentially important 


1 Sometimes purely exogenous variables are included to allow for trend and seasonal factors. 

14 See, for example, T. Kinal and J. B. Ratner, "Regional Forecasting Models with Vector Autoregres¬ 
sion: The Case of New York State," Discussion Paper #155, Department of Economics, State 
University of New York at Albany, 1982. 

15 lf we have an m-equation VAR model with p lagged values of the m variables, in all we have to 
estimate ( m+ pm 2 ) parameters. 

16 Andrew Harvey, The Econometric Analysis of Time Series, The MIT Press, 2d ed., Cambridge, Mass., 
1990, p. 83. 
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stationary variables (i.e., the error-correction, cointegrating vectors) and hence parameter 
estimates may suffer from omitted variables bias.” 17 

5. Since the individual coefficients in the estimated VAR models are often difficult to 
interpret, the practitioners of this technique often estimate the so-called impulse 
response function (IRF). The 1RF traces out the response of the dependent variable in the 
VAR system to shocks in the error terms, such as u\ and 112 in Eqs. (22.9.1) and (22.9.2). 
Suppose u\ in the M\ equation increases by a value of one standard deviation. Such a 
shock or change will change M\ in the current as well as future periods. But since M\ 
appears in the R regression, the change in u\ will also have an impact on R. Similarly, a 
change of one standard deviation in m of the R equation will have an impact on M\. The 
fRF traces out the impact of such shocks for several periods in the future. Although the 
utility of such IRF analysis has been questioned by researchers, it is the centerpiece of 
VAR analysis. 18 

For a comparison of the performance of VAR with other forecasting techniques, the 
reader may consult the references. 19 

An Application of VAR: A VAR Model of the Texas Economy 

To test the conventional wisdom, “As the oil patch goes, so goes the Texas economy,” 
Thomas Fomby and Joseph Hirschberg developed a three-variable VAR model of the Texas 
economy for the period 1974-1 to 1988—I. 20 The three variables considered were (1) per¬ 
centage change in real price of oil, (2) percentage change in Texas nonagricultural employ¬ 
ment, and (3) percentage change in nonagricultural employment in the rest of the United 
States. The authors introduced the constant term and two lagged values of each variable in 
each equation. Therefore, the number of parameters estimated in each equation was seven. 
The results of the OLS estimation of the VAR model are given in Table 22.4. The F tests 
given in this table are to test the hypothesis that collectively the various lagged coefficients 
are zero. Thus, the F test for the x variable (percentage change in real price of oil) shows that 
both the lagged terms of x are statistically different from zero; the probability of obtaining 
an F value of 12.5536 under the null hypothesis that they are both simultaneously equal 
to zero is very low, about 0.00004. On the other hand, collectively, the two lagged y values 
(percentage change in Texas nonagricultural employment) are not significantly different 
from zero to explain x; the F value is only 1.36. All other F statistics are to be interpreted 
similarly. 

On the basis of these and other results presented in their paper, Fomby and Hirschberg 
conclude that the conventional wisdom about the Texas economy is not quite accurate, for 
after the initial instability resulting from OPEC oil shocks, the Texas economy is now less 
dependent on fluctuations in the price of oil. 


17 Keith Cuthbertson, Quantitative Financial Economics: Stocks, Bonds and Foreign Exchange, 

]ohn Wiley & Sons, New York, 2002, p.436. 

18 D. E. Runkle, "Vector Autoregression and Reality," Journal of Business and Economic Statistics, vol. 5, 
1987, pp. 437-454. 

19 S. McNees, "Forecasting Accuracy of Alternative Techniques: A Comparison of U.S. Macroeconomic 
Forecasts," Journal of Business and Economic Statistics, vol. 4, 1986, pp. 5-15; and E. Mahmoud, 
"Accuracy in Forecasting: A Survey," Journal of Forecasting, vol. 3, 1984, pp. 139-159. 

20 Thomas B. Fomby and Joseph C. Hirschberg, "Texas in Transition: Dependence on Oil and the 
National Economy," Economic Review, Federal Reserve Bank of Dallas, January 1989, pp. 11-28. 
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TABLE 22.4 
Estimation Results 
for Second-Order* 
Texas VAR System: 
1974-1 to 1988-1 

Federal Reserve Bank of Dallas 
January 1989, p. 21. 


Dependent variable: 

x (percentage change in real price of oil) 


Variable 

Lag 

Coefficient Standard error 

Significance level 

x 

1 

0.7054 

0.1409 

0.8305E—5 

X 

2 

-0.3351 

0.1500 

0.3027E—1 

y 

1 

-1.3525 

2.7013 

0.6189 

y 

2 

3.4371 

2.4344 

0.1645 

Z 

1 

3.4566 

2.8048 

0.2239 

Z 

2 

-4.8703 

2.7500 

0.8304E—1 

Constant 

0 

—0.9983E—2 

0.1696E-1 

0.5589 

R =0.2982; Q(21) 

= 8.2618 (P= 0.9939) 



Tests for joint significance, dependent variable 

= X 


Variable 


F-statistic 


Significance level 

x 


12.5536 


0.4283E—4 

y 


1.3646 


0.2654 

Z 


1.5693 


0.2188 

Dependent variable: / (percentage change in Texas nonagricultural employment) 

Variable 

Lag 

Coefficient Standard error 

Significance level 

x 

1 

0.2228E—1 

0.8759E—2 

0.1430E—1 

X 

2 

—0.1883E—2 

0.9322E—2 

0.8407 

y 

1 

0.6462 

0.1678 

0.3554E—3 

y 

2 

0.4234E—1 

0.1512 

0.7807 

z 

1 

0.2655 

0.1 742 

0.1342 

z 

2 

-0.1715 

0.1 708 

0.3205 

Constant 

0 

-0.1602E-2 

0.1053E-1 

0.1351 

R = 0.6316; Q(21) 

= 21.5900 (P = 0.4234) 



Tests for joint significance, dependent variable 

= y 


Variable 


F-statistic 


Significance level 

x 


3.6283 


0.3424E—4 

y 


19.1440 


0.8287E—6 

z 


1.1684 


0.3197 

Dependent variable: 

z (percentage change in nonagricultural employment in rest of 

United States) 





Variable 

Lag 

Coefficient Standard error 

Significance level 

x 

1 

—0.8330E—2 

0.6849E—2 

0.2299 

x 

2 

0.3635E—2 

0.7289E—2 

0.6202 

y 

1 

0.3849 

0.1312 

0.5170E-2 

y 

2 

-0.4805 

0.1182 

0.1828E-2 

z 

1 

0.7226 

0.1362 

0.3004E—5 

z 

2 

-0.1366E-1 

0.1336 

0.9190 

Constant 

0 

—0.2387E—2 

0.8241 E-3 

0.5701 E—2 

R 2 = 0.6503; 

Q(21) 

= 15.6182 (P = 0.7907) 



Tests for joint significance, dependent variable 

= z 


Variable 


F-statistic 


Significance level 

x 


0.7396 


0.4827 

/ 


8.2714 


0.8360E—3 

z 


27.9609 


0.1000E-7 
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22.10 Measuring Volatility in Financial Time Series: 
The ARCH and GARCH Models 



As noted in the introduction to this chapter, financial time series, such as stock prices, 
exchange rates, inflation rates, etc., often exhibit the phenomenon of volatility clustering, 
that is, periods in which their prices show wide swings for an extended time period 
followed by periods in which there is relative calm. As Philip Franses notes: 

Since such [financial time series] data reflect the result of trading among buyers and sellers at, 
for example, stock markets, various sources of news and other exogenous economic events 
may have an impact on the time series pattern of asset prices. Given that news can lead to 
various interpretations, and also given that specific economic events like an oil crisis can last 
for some time, we often observe that large positive and large negative observations in financial 
time series tend to appear in clusters. 

Knowledge of volatility is of crucial importance in many areas. For example, consider¬ 
able macroeconometric work has been done in studying the variability of inflation over 
time. For some decision makers, inflation in itself may not be bad, but its variability is bad 
because it makes financial planning difficult. 

The same is true of importers, exporters, and traders in foreign exchange markets, for 
variability in the exchange rates means huge losses or profits. Investors in the stock market 
are obviously interested in the volatility of stock prices, for high volatility could mean huge 
losses or gains and hence greater uncertainty. In volatile markets it is difficult for compa¬ 
nies to raise capital in the capital markets. 

How do we model financial time series that may experience such volatility? For exam¬ 
ple, how do we model times series of stock prices, exchange rates, inflation, etc.? A char¬ 
acteristic of most of these financial time series is that in their level form they are random 
walks; that is, they are nonstationary. On the other hand, in the first difference form, they 
are generally stationary, as we saw in the case of GDP series in the previous chapter, even 
though GDP is not strictly a financial time series. 

Therefore, instead of modeling the levels of financial time series, why not model their first 
differences? But these first differences often exhibit wide swings, or volatility, suggesting 
that the variance of financial time series varies over time. How can we model such “varying 
variance”? This is where the so-called autoregressive conditional heteroscedasticity 
(ARCH) model originally developed by Engle comes in handy. 22 

As the name suggests, heteroscedasticity, or unequal variance, may have an autoregres¬ 
sive structure in that heteroscedasticity observed over different periods may be autocorre- 
lated. To see what all this means, let us consider a concrete example. 

EXAMPLE 22.1 

U.S./U.K. 
Exchange Rate: 

An Example 

Figure 22.6 gives logs of the monthly U.S./U.K. exchange rate (dollars per pound) for the 
period 1971-2007, for a total of 444 monthly observations. As you can see from this 
figure, there are considerable ups and downs in the exchange rate over the sample period. 

To see this more vividly, in Figure 22.7 we plot the changes in the logs of the exchange 

( Continued ) 

21 Philip Flans Franses, Time Series Models for Business and Economic Forecasting, Cambridge University 
Press, New York, 1998, p. 155. 

22 R. Engle, "Autoregressive Conditional Fleteroscedasticity with Estimates of the Variance of United 
Kingdom Inflation," Econometrica, vol. 50. no. 1, 1982, pp. 987-1007. See also A. Bera and M. 

Higgins, "ARCFI Models: Properties, Estimation and Testing," journal of Economic Surveys, vol. 7, 1993, 
pp. 305-366. 
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EXAMPLE 22.1 

(' Continued) 


FIGURE 22.6 Log ofU.S./U.K. exchange rate, 1971-2007 (monthly) 



FIGURE 22.7 Change in the log of U.S./U.K. 



rate; note that changes in the log of a variable denote relative changes, which, if multi¬ 
plied by 100, give percentage changes. As you can observe, the relative changes in the 
U.S./U.K. exchange rate show periods of wide swings for some time periods and periods 
of rather moderate swings in other time periods, thus exemplifying the phenomenon of 
volatility clustering. 

Now the practical question is: How do we statistically measure volatility? Let us 
illustrate this with our exchange rate example. 

Let Y t = U.S./U.K. exchange rate 
Y* = log of Y t 

dY* = Y* — Y*-i = relative change in the exchange rate 
d? t * = mean of dY t * 

X t = dY* — dY * 
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EXAMPLE 22.1 

(' Continued) 


Thus, Xt is the mean-adjusted relative change in the exchange rate. Now we can use X 2 
as a measure of volatility. Being a squared quantity, its value will be high in periods when 
there are big changes in the prices of financial assets and its value will be comparatively 
small when there are modest changes in the prices of financial assets. 23 

Accepting X 2 as a measure of volatility, how do we know if it changes over time? 
Suppose we consider the following AR(1), or ARIMA (1, 0, 0), model: 

X 2 = A> + Xj|| + Ut (22.10.1) 

This model postulates that volatility in the current period is related to its value in the pre¬ 
vious period plus a white noise error term. If /Si is positive, it suggests that if volatility was 
high in the previous period, it will continue to be high in the current period, indicating 
volatility clustering. If /3-\ is zero, then there is no volatility clustering. The statistical signif¬ 
icance of the estimated /S2 can be judged by the usual t test. 

There is nothing to prevent us from considering an AR(p) model of volatility such that 

X 2 = Ai + fit X^j + /S2 X^_ 2 + ■ ■ ■ + PpX(_ p ft- Ut (22.10.2) 

This model suggests that volatility in the current period is related to volatility in the past p 
periods, the value of p being an empirical question. This empirical question can be resolved 
by one or more of the model selection criteria that we discussed in Chapter 13 (e.g., the 
Akaike information measure). We can test the significance of any individual ft coefficient by 
the t test and the collective significance of two or more coefficients by the usual Ftest. 

Model (22.10.1) is an example of an ARCH(1) model and Eq. (22.10.2) is called an 
ARCH(p) model, where p represents the number of autoregressive terms in the model. 

Before proceeding further, let us illustrate the ARCH model with the U.S./U.K. 
exchange rate data. The results of the ARCH(1) model were as follows. 

X 2 s= 0.00043 + 0.23036X^1 

t = (7.71) (4.97) (22.10.3) 

R 2 = 0.0531 cf = 1.9933 


where X 2 is as defined before. 

Since the coefficient of the lagged term is highly significant (p value of about 0.000), it 
seems volatility clustering is present in the present instance. We tried higher-order ARCH 
models, but only the AR(1) model turned out to be significant. 

How would we test for the ARCH effect in a regression model in general that is based 
on time series data? To be more specific, let us consider the k-variable linear regression 
model: 


Yt = fit ft P2X21 ft - 1- ftkXkt ft- Ut ( 22 . 10 . 4 ) 

and assume that conditional on the information available at time (t— 1), the disturbance 
term is distributed as 


u t ~ N[0, (ao + «i uf_t)] (22.10.5) 

( Continued ) 


23 You might wonder why we do not use the variance of X t = X 2 /n as a measure of volatility. This 
is because we want to take into account changing volatility of asset prices over time. If we use the 
variance of X t , it will only be a single value for a given data set. 
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EXAMPLE 22.1 

0 Continued) 


var(iv t ) = a , 2 = a 0 + cn u 2 _^ + a 2 u 2 _ 2 + ■ ■ ■ + a p uf_ p (22.10.7) 
If there is no autocorrelation in the error variance, we have 

H 0 : on = a z = • • • = a p = 0 (22.10.8) 

in which case var(u t ) = ao, and we do not have the ARCH effect. 

Since we do not directly observe of, Engle has shown that running the following 
regression can easily test the preceding null hypothesis: 

uf = a 0 + ai u 2 _! + &2U?_ 2 4 -1- a p uf_ p (22.10.9) 

where G tl as usual, denotes the OLS residuals obtained from the original regression model 
(22.10.4). 

One can test the null hypothesis Ho by the usual F test, or alternatively, by computing 
nR 2 , where R 2 is the coefficient of determination from the auxiliary regression (22.10.9). 
It can be shown that 

"Kfsy-Xp (22.10.10) 

that is, in large samples nR 2 follows the chi-square distribution with df equal to the 
number of autoregressive terms in the auxiliary regression. 

Before we proceed to illustrate, make sure that you do not confuse autocorrelation of 
the error term as discussed in Chapter 12 and the ARCH model. In the ARCH model it is 
the (conditional) variance of u t that depends on the (squared) previous error terms, thus 
giving the impression of autocorrelation. 


that is, u t is normally distributed with zero mean and 

var(ut) = (ao +aiu 2 _ 1 ) ( 22 . 10 . 6 ) 

that is, the variance of u t follows an ARCH(1) process. 

The normality of u t is not new to us. What is new is that the variance of u at time t is 
dependent on the squared disturbance at time (t — 1), thus giving the appearance of serial 
correlation. 24 Of course, the error variance may depend not only on one lagged term of 
the squared error term but also on several lagged squared terms as follows: 


EXAMPLE 22.2 

New York Stock 
Exchange Price 
Changes 


As a further illustration of the ARCH effect. Figure 22.8 presents monthly percentage 
change in the NYSE (New York Stock Exchange) Index for the period 1966-2002. 25 It is 
evident from this graph that the percent price changes in the NYSE Index exhibit consid¬ 
erable volatility. Notice especially the wide swing around the 1987 crash in stock prices. 

To capture the volatility in the stock return seen in the figure, let us consider a very 
simple model: 


Y t = fa +ut (22.10.11) 

where Y t = percent change in the NYSE stock index and u t — random error term. 


24 A technical note: Remember that for our classical linear model the variance of u t was assumed to be 
a 2 , which in the present context becomes unconditional variance. If ai <1, the stability condition, 
we can write er 2 = ao + ai o 2 ', that is, a 2 = ao/(1 — ai). This shows that the unconditional variance 
of u does not depend on t, but does depend on the ARCH parameter ai. 

25 This graph and the regression results presented in this example are based on the data collected by 
Cary Koop, Analysis of Economic Data, John Wiley & Sons, New York, 2000 (data from the data disk). The 
monthly percentage change in the stock price index can be regarded as a rate of return on the index. 
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EXAMPLE 22.2 

( Continued) 


FIGURE 22.8 Monthly percent change in the NYSE Price Index, 1966-2002. 



Year 

Notice that besides the intercept, there is no other explanatory variable in the model. 
From the data, we obtained the following OLS regression: 

Y t = 0.00574 

t = (3.36) (22.10.12) 

d— 1.4915 

What does this intercept denote? It is simply the average percent rate of return on the 
NYSE index, or the mean value of Y t (can you verify this?). Thus over the sample period 
the average monthly return on the NYSE index was about 0.00574 percent. 

Now we obtain the residuals from the preceding regression and estimate the ARCH(1) 
model, which gave the following results: 

u t 2 = 0.000007+ 0.25406uf_ 1 

t= (0.000) (5.52) (22.10.13) 

R 2 = 0.0645 cf= 1.9464 

where u t is the estimated residual from regression (22.10.12). 

Since the lagged squared disturbance term is statistically significant (p value of about 
0.000), it seems the error variances are correlated; that is, there is an ARCH effect. We tried 
higher-order ARCH models but only ARCH(1) was statistically significant. 


What to Do If ARCH Is Present 

Recall that we have discussed several methods of correcting for heteroscedasticity, which 
basically involved applying OLS to transformed data. Remember that OLS applied to trans¬ 
formed data is generalized least squares (GLS). If the ARCH effect is found, we will have 
to use GLS. We will not pursue the technical details, for they are beyond the scope of 
this book. 26 Fortunately, software packages such as EViews, SHAZAM, MICROFIT, and 
PC-GIVE now have user-friendly routines to estimate such models. 


26 Consult Russell Davidson and James C. MacKinnon, Estimation and Inference in Econometrics, Oxford 
University Press, New York, 1993, Section 16.4 and William H. Greene, Econometric Analysis, 4th ed., 
Prentice Hall, Englewood Cliffs, NJ, 2000, Section 18.5. 
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A Word on the Durbin-Watson d and the ARCH Effect 

We have reminded the reader several times that a significant d statistic may not always 
mean that there is significant autocorrelation in the data at hand. Very often a significant d 
value is an indication of the model specification errors that we discussed in Chapter 13. 
Now we have an additional specification error, due to the ARCH effect. Therefore, in a time 
series regression, if a significant d value is obtained, we should test for the ARCH effect 
before accepting the d statistic at its face value. An example is given in Exercise 22.23. 

A Note on the GARCH Model 

Since its “discovery” in 1982, ARCH modeling has become a growth industry, with all 
kinds of variations on the original model. One that has become popular is the generalized 
autoregressive conditional heteroscedasticity (GARCH) model, originally proposed 
by Bollerslev. 27 The simplest GARCH model is the GARCH(1, 1) model, which can be 
written as: 

af = do + a\u 2 t _ j + 0120 < 2 _ 1 (22.10.14) 

which says that the conditional variance of u at time t depends not only on the squared error 
term in the previous time period (as in ARCH[1]) but also on its conditional variance in the 
previous time period. This model can be generalized to a GARCHf;;, q) model in which there 
are p lagged terms of the squared error term and q terms of the lagged conditional variances. 

We will not pursue the technical details of these models, as they are involved, except 
to point out that a GARCH(1, 1) model is equivalent to an ARCH(2) model and a 
GARCHO, q) model is equivalent to an ARCHfp + q) model. 28 

For our U.S./U.K. exchange rate and NYSE stock return examples, we have already 
stated that an ARCH(2) model was not significant, suggesting that perhaps a GARCH(1, 1) 
model is not appropriate in these cases. 

22.11 Concluding Examples 

We conclude this chapter by considering a few additional examples that illustrate some of 
the points we have made in this chapter. 


EXAMPLE 22.3 

The Relationship 
between the 
Help-Wanted 
Index (HWI) and 
the Unemploy¬ 
ment Rate (UN) 
from January 
1969 to January 
2000 


To study causality between HWI and UN, two indicators of labor market conditions in the 
United States, Marc A. Giammatteo considered the following regression model: 29 

25 25 

HWI, = t* 0 + £ a,-UN, ,• + Y, jS/HWI, , (22.11.1) 

25 25 

UN, = a 0 + J2 */UN,_; + «/HWI,_y (22.11.2) 

To save space we will not present the actual regression results, but the main conclusion 
that emerges from this study is that there is bilateral causality between the two labor 
market indicators and this conclusion did not change when the lag length was varied. The 
data on HWI and UN are given on the textbook website as Table 22.5. 


27 T. Bollerslev, "Generalized Autoregressive Conditional Heteroscedasticity," journal of Econometrics, 
vol. 31, 1986, pp. 307-326. 

28 For details, see Davidson and MacKinnon, op. cit., pp. 558-560. 

29 Marc A. Giammatteo (West Point, Class of 2000), "The Relationship between the Help Wanted Index 
and the Unemployment Rate," unpublished term paper. (Notations altered to conform to our notation.) 
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EXAMPLE 22.4 

ARIMA Modeling 
of the Yen/Dollar 
Exchange Rate: 
January 1971 to 
April 2008 


The yen/dollar exchange rate (¥/$) is a key exchange rate. From the logarithms of the 
monthly ¥/$, it was found that in the level form this exchange rate showed the typical pat¬ 
tern of a nonstationary time series. But examining the first differences, it was found that 
they were stationary; the graph here pretty much resembles Figure 22.8. 

Unit root analysis confirmed that the first differences of the logs of ¥/$ were stationary. 
After examining the correlogram of the log first differences, we estimated the following 
MA(1) model: 

Y t = -0.0028 - 0.3300u t _i 


t= (—1.71) (-7.32) (22.11.3) 

R 2 = 0.1012 d= 1.9808 

where Y t = first differences of the logs of ¥/$ and u = a white noise error term. 

To save space, we have provided the data underlying the preceding analysis on the 
textbook website in Table 22.6. Using these data, the reader is urged to try other models 
and compare their forecasting performances. 


EXAMPLE 22.5 

ARCH Model of 
the U.S. Inflation 
Rate: January 
1947 to March 
2008 


To see if the ARCH effect is present in the U.S. inflation rate as measured by the CPI, we 
obtained CPI data from January 1947 to March 2008. The plot of the logarithms of the CPI 
showed that the time series was nonstationary. But the plot of the first differences of the 
logs of the CPI, as shown in Figure 22.9, shows considerable volatility even though the 
first differences are stationary. 

Following the procedure outlined in regressions (22.10.12) and (22.10.13), we first 
regressed the logged first differences of CPI on a constant and obtained residuals from this 
equation. Squaring these residuals, we obtained the following ARCH(2) model: 


uf = 0.000028+ 0.12125(7^ + 0.0871 8uf 2 


= (5.42) 


R 2 = 0.026 d= 2.0214 


(22.11.4) 


FIGURE 22.9 

First differences of 
the logs of CPI. 



Year 


( Continued) 
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EXAMPLE 22.5 
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Summary and 
Conclusions 


As you can see, there is quite a bit of persistence in the volatility, as volatility in the current 
month depends on volatility in the preceding 2 months. The reader is advised to obtain 
CPI data from government sources and try to see if another model, preferably a GARCH 
model, does a better job. 


1. Box-Jenkins and VAR approaches to economic forecasting are alternatives to tradi¬ 
tional single- and simultaneous-equation models. 

2. To forecast the values of a time series, the basic Box-Jenkins strategy is as follows: 

a. First examine the series for stationarity. This step can be done by computing the 
autocorrelation function (ACF) and the partial autocorrelation function (PACF) or by 
a formal unit root analysis. The correlograms associated with ACF and PACF are 
often good visual diagnostic tools. 

b. If the time series is not stationary, difference it one or more times to achieve stationarity. 

c. The ACF and PACF of the stationary time series are then computed to find out if the series 
is purely autoregressive or purely of the moving average type or a mixture of the two. 
From broad guidelines given in Table 22.1 one can then determine the values of p and q in 
the ARMA process to be fitted. At this stage the chosen ARMA(p, q) model is tentative. 

d. The tentative model is then estimated. 

e. The residuals from this tentative model are examined to find out if they are white 
noise. If they are, the tentative model is probably a good approximation to the under¬ 
lying stochastic process. If they are not, the process is started all over again. There¬ 
fore, the Box-Jenkins method is iterative. 

f The model finally selected can be used for forecasting. 

3. The VAR approach to forecasting considers several time series at a time. The distin¬ 
guishing features of VAR are as follows: 

a. It is a truly simultaneous system in that all variables are regarded as endogenous. 

b. In VAR modeling the value of a variable is expressed as a linear function of the past, 
or lagged, values of that variable and all other variables included in the model. 

c. If each equation contains the same number of lagged variables in the system, it can 
be estimated by OLS without resorting to any systems method, such as two-stage 
least squares (2SLS) or seemingly unrelated regressions (SURE). 

d. This simplicity of VAR modeling may be its drawback. In view of the limited num¬ 
ber of observations that are generally available in most economic analyses, introduc¬ 
tion of several lags of each variable can consume a lot of degrees of freedom. 30 

e. If there are several lags in each equation, it is not always easy to interpret each coeffi¬ 
cient, especially if the signs of the coefficients alternate. For this reason one examines 
the impulse response function (IRF) in VAR modeling to find out how the dependent 
variable responds to a shock administered to one or more equations in the system. 

f. There is considerable debate and controversy about the superiority of the various fore¬ 
casting methods. Single-equation, simultaneous-equation, Box-Jenkins, and VAR 
methods of forecasting have their admirers as well as their detractors. All one can 
say is that there is no single method that will suit all situations. If that were the case, 
there would be no need for discussing the various alternatives. One thing is sure: 
The Box-Jenkins and VAR methodologies have now become an integral part of 
econometrics. 

30 Followers of Bayesian statistics believe that this problem can be minimized. See R. Litterman, 

"A Statistical Approach to Economic Forecasting," Journal of Business and Economic Statistics, 
vol. 4, 1986, pp. 1-4. 
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EXERCISES 


4. We also considered in this chapter a special class of models, ARCH and GARCH, 
which are especially useful in analyzing financial time series, such as stock prices, 
inflation rates, and exchange rates. A distinguishing feature of these models is that the 
error variance may be correlated over time because of the phenomenon of volatility 
clustering. In this connection we also pointed out that in many cases a significant 
Durbin-Watson d may in fact be due to the ARCH or GARCH effect. 

5. There are variants of ARCH and GARCH models, but we have not considered them in 
this chapter due to space constraints. Some of these other models are: GARCH-M 
(GARCH in mean), TGARCH (threshold GARCH), and EGARCH (exponential 
GARCH). A discussion of these models can be found in the references. 31 


Questions 

22.1. What are the major methods of economic forecasting? 

22.2. What are the major differences between simultaneous-equation and Box-Jenkins 
approaches to economic forecasting? 

22.3. Outline the major steps involved in the application of the Box-Jenkins approach to 
forecasting. 

22.4. What happens if Box-Jenkins techniques are applied to time series that are 
nonstationary? 

22.5. What are the differences between Box-Jenkins and VAR approaches to economic 
forecasting? 

22.6. In what sense is VAR atheoretic? 

22.7. “If the primary object is forecasting, VAR will do the job.” Critically evaluate this 
statement. 

22.8. Since the number of lags to be introduced in a VAR model can be a subjective ques¬ 
tion, how does one decide how many lags to introduce in a concrete application? 

22.9. Comment on this statement: “Box-Jenkins and VAR are prime examples of 
measurement without theory.” 

22.10. What is the connection, if any, between Granger causality tests and VAR modeling? 

Empirical Exercises 

22.11. Consider the data on log DPI (personal disposable income) introduced in Section 21.1 
(see the book’s website for the actual data). Suppose you want to fit a suitable ARIMA 
model to these data. Outline the steps involved in carrying out this task. 

22.12. Repeat Exercise 22.11 for the LPCE (personal consumption expenditure) data 
introduced in Section 21.1 (again, see the book’s website for the actual data). 

22.13. Repeat Exercise 22.11 fortheLCP. 

22.14. Repeat Exercise 22.11 for the LDNIDENDS. 

22.15. In Section 13.9 you were introduced to the Schwarz Information criterion (SIC) to 
determine lag length. How would you use this criterion to determine the appropri¬ 
ate lag length in a VAR model? 

22.16. Using the data on LPCE and LDPI introduced in Section 21.1 (see the book’s web¬ 
site for the actual data), develop a bivariate VAR model for the period 1970-1 to 
2006-IV Use this model to forecast the values of these variables for the four quarters 
of 2007 and compare the forecast values with the actual values given in the dataset. 

31 See Walter Enders, Applied Econometric Time Series, 2d ed., John Wiley & Sons, New York, 2004. For 

an application-oriented discussion, see Dimitrios Asteriou and Stephen Hall, Applied Econometrics: A 

Modern Approach, revised edition, Palgrave/Macmillan, New York, 2007, Chapter 14. 
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22.17. Repeat Exercise 22.16, using the data on LDIVIDENDS and LCP. 

*22.18. Refer to any statistical package and estimate the impulse response function for a 
period of up to 8 lags for the VAR model that you developed in Exercise 22.16. 

22.19. Repeat Exercise 22.18 for the VAR model that you developed in Exercise 22.17. 

22.20. Refer to the VAR regression results given in Table 22.4. From the various F tests 
reported in the three regressions given there, what can you say about the nature of 
causality in the three variables? 

22.21. Continuing with Exercise 20.20, can you guess why the authors chose to express the 
three variables in the model in percentage change form rather than using the levels 
of these variables? (Hint: Stationarity.) 

22.22. Using the Canadian data given in Table 17.5, find out if M\ and R are stationary 
random variables. If not, are they cointegrated? Show the necessary calculations. 

22.23. Continue with the data given in Table 17.5. Now consider the following simple 
model of money demand in Canada: 

In M\t — Pi + @2 In GDP, + ,63 In if, + m, 

a. How would you interpret the parameters of this model? 

b. Obtain the residuals from this model and find out if there is any ARCH effect. 

22.24. Refer to the ARCH(2) model given in Eq. (22.11.4). Using the same data we 
estimated the following ARCH(l) model: 

fi? = 0.00000078+ 0.3737 m?.! 

t= (7.5843) (10.2351) 

i? 2 = 0.1397 d = 1.9896 

How would you choose between the two models? Show the necessary calculations. 

22.25. Table 22.7 gives data on three-month (TB3M) and six-month (TB6M) Treasury bill 
rates from January 1,1982, to March 2008, for a total of 315 monthly observations. 
The data can be found on the textbook’s website. 

a. Plot the two time series in the same diagram. What do you see? 

b. Do a formal unit root analysis to find out if these time series are stationary. 

c. Are the two time series cointegrated? How do you know? Show the necessary 
calculations. 

d. What is the economic meaning of cointegration in the present context? If the two 
series are not cointegrated, what are the economic implications? 

e. If you want to estimate a VAR model, say, with four lags of each variable, do you 
have to use the first differences of the two series or can you do the analysis in 
levels of the two series? Justify your answer. 

22.26. Class Exercise: Pick a stock market index of your choosing and obtain daily data 
on the value of the chosen index for five years to find out if the stock index is char¬ 
acterized by ARCH effects. 

22.27. Class Exercise: Collect data on inflation and unemployment rates in the U.S. for the 
quarterly periods in 1980-2007 and develop and estimate a VAR model for the two 
variables. To compute the inflation rate, use CPI (consumer price index) and use the 
civilian unemployment rate for the unemployment rate. Pay careful attention to 
the stationarity of these variables. Also, find out if one variable Granger-causes the 
other variable. Present all your calculations. 


'Optional. 



Appendix 


A 


A Review of Some 
Statistical Concepts 


This appendix provides a very sketchy introduction to some of the statistical concepts 
encountered in the text. The discussion is nonrigorous, and no proofs are given because 
several excellent books on statistics do that job very well. Some of these books are listed at 
the end of this appendix. 


A. 1 Summation and Product Operators 


The Greek capital letter E (sigma) is used to indicate summation. Thus, 
T>, = X\ + X2 H- \-x„ 


Some of the important properties of the summation operator E are 

1. k = nk, where k is constant. Thus, E* i 3 = 4-3= 12. 

2. EL. kxi = k ELi x ‘ > where k is a constant. 

3. EL i ( a + kxi) — na +b EL. *;> where a and b are constants and where use is made of 
properties 1 and 2 above. 

4. ELi(** + Vi) = EU x i + EU *■ 

The summation operator can also be extended to multiple sums. Thus, E E double 
summation operator, is defined as 



= (xn + X21 H -+ x n \) + {x\2 + X22 H - 1- x nl) 

+ • • • + ( x lm + x 2m + ‘ + X nm ) 


Some of the properties of E E are 

1. EL i EL I x ij — EL i ELi X U > that is, the order in which the double summation is 
performed is interchangeable. 


2 - ELi EL. *yj = ELi x > ELi yj ■ 
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3- E?.i +rv) = E?.i L'l *tr + E!= £,".i n- 

4. E„ *] 2 = *, 2 + 2 EE, 1 HU +1 w* = Em *? + 2 E,<, w*. 

The product operator n is defined as 



Y[ X ‘ =Xl-X 2 -XT, 


A.2 Sample Space, Sample Points, and Events 

The set of all possible outcomes of a random, or chance, experiment is called the population, 
or sample space, and each member of this sample space is called a sample point. Thus, in 
the experiment of tossing two coins, the sample space consists of these four possible out¬ 
comes: HH, HT, TH, and TT, where ////means a head on the first toss and also a head on the 
second toss, HT means a head on the first toss and a tail on the second toss, and so on. Each 
of the preceding occurrences constitutes a sample point. 

An event is a subset of the sample space. Thus, if we let A denote the occurrence of one 
head and one tail, then, of the preceding possible outcomes, only two belong to A, namely 
HT and TH. In this case A constitutes an event. Similarly, the occurrence of two heads in a 
toss of two coins is an event. Events are said to be mutually exclusive if the occurrence of 
one event precludes the occurrence of another event. If in the preceding example HH 
occurs, the occurrence of the event HT at the same time is not possible. Events are said to 
be (collectively) exhaustive if they exhaust all the possible outcomes of an experiment. 
Thus, in the example, the events (a) two heads, (b) two tails, and (c) one tail, one head 
exhaust all the outcomes; hence they are (collectively) exhaustive events. 

A. 3 Probability and Random Variables 
Probability 

Let A be an event in a sample space. By P(A), the probability of the event A, we mean the 
proportion of times the event A will occur in repeated trials of an experiment. Alternatively, 
in a total of n possible equally likely outcomes of an experiment, if m of them are favorable 
to the occurrence of the event A, we define the ratio m/n as the relative frequency of A. For 
large values of n, this relative frequency will provide a very good approximation of the 
probability of A. 

Properties of Probability 

P{A) is a real-valued function 1 and has these properties: 

1. 0 < P(A) < 1 for every A. 

2. If A, B,C,... constitute an exhaustive set of events, then P(A + B + C H-) = 1, 

where A + B + C means A or B or C, and so forth. 

3. If A, B, C, ... are mutually exclusive events, then 

P(A + B + C+ ■■■) = P(A) + P(B) + P(C) + ■■■ 


'A function whose domain and range are subsets of real numbers is commonly referred to as a real¬ 
valued function. For details, see Alpha C. Chiang, Fundamental Methods of Mathematical Economics, 
3d ed., McGraw-Hill, 1984, Chapter 2. 
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EXAMPLE 1 Consider the experiment of throwing a die numbered 1 through 6. The sample space con¬ 

sists of the outcomes 1, 2, 3, 4, 5, and 6. These six events therefore exhaust the entire 
sample space. The probability of any one of these numbers showing up is 1 /6 since there 
are six equally likely outcomes and any one of them has an equal chance of showing up. 
Since 1, 2, 3, 4, 5, and 6 form an exhaustive set of events, P( 1 + 2 + 3 + 4-|-5-|-6) = 1 
where 1,2, 3,... means the probability of number 1 or number 2 or number 3, etc. And 
since 1,2,..., 6 are mutually exclusive events in that two numbers cannot occur simulta¬ 
neously, P(1 +2 + 3+4 + 5 + 6)= P(1)+P(2)+-+P(6) = 1. 


Random Variables 

A variable whose value is determined by the outcome of a chance experiment is called a 
random variable (rv). Random variables are usually denoted by the capital letters X, Y, Z, 
and so on, and the values taken by them are denoted by small letters x, y, z, and so on. 

A random variable may be either discrete or continuous. A discrete rv takes on only a 
finite (or countably infinite) number of values. 2 For example, in throwing two dice, each 
numbered 1 to 6, if we define the random variable X as the sum of the numbers showing 
on the dice, then X will take one of these values: 2, 3,4, 5, 6, 7, 8, 9, 10, 11, or 12. Flence 
it is a discrete random variable. A continuous rv, on the other hand, is one that can take 
on any value in some interval of values. Thus, the height of an individual is a continuous 
variable—in the range, say, 60 to 65 inches it can take any value, depending on the preci¬ 
sion of measurement. 


A.4 Probability Density Function (PDF) 

Probability Density Function of a Discrete Random Variable 

Let Abe a discrete rv taking distinct values x\,xi, ■ ■ ■ ,x n , _Then the function 

f(x ) = P(X = Xi ) for i = 1, 2,. 

= 0 for x ^ Xi 

is called the discrete probability density function (PDF) ofX, where P(X = x t ) means 
the probability that the discrete rv X takes the value of x,. 


EXAMPLE 2 In a throw of two dice, the random variable X, the sum of the numbers shown on two 

dice, can take one of the 11 values shown. The PDF of this variable can be shown to be as 
follows (see also Figure A.1): 

X= 2 3 4 5 6 7 8 9 10 11 12 

^^ (»)(»)(»)(»)(»)(»)(*)(»)(»)(»)(*) 

These probabilities can be easily verified. In all there are 36 possible outcomes, of which 
one is favorable to number 2, two are favorable to number 3 (since the sum 3 can occur 
either as 1 on the first die and 2 on the second die or 2 on the first die and 1 on the second 
die), and so on. 

( Continued ) 

2 For a simple discussion of the notion of countably infinite sets, see R. C. D. Allen, Basic Mathematics, 
Macmillan, London, 1964, p. 104. 
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EXAMPLE 2 FIGURE A.1 Density function of the discrete random variable of Example 2. 

( Continued ) 



Probability Density Function of a Continuous Random Variable 

Let X be a continuous rv. Then f(pc) is said to be the PDF of X if the following conditions 
are satisfied: 


/« > o 
J f{x) dx = 1 

j f(x) dx = P(a < x < b) 

where f(x) dx is known as the probability element (the probability associated with a small 
interval of a continuous variable) and where P(a < x < b) means the probability that X 
lies in the interval a to b. Geometrically, we have Figure A.2. 

For a continuous rv, in contrast with a discrete rv, the probability that X takes a specific 
value is zero; 3 probability for such a variable is measurable only over a given range or in¬ 
terval, such as {a, b) shown in Figure A.2. 


EXAMPLE 3 Consider the following density function: 

f(x) = l* 2 0 < x < 3 

It can be readily verified that f(x ) > 0 for all x in the range 0 to 3 and that J 0 ^x 2 dx = 1. 
(Note: The integral is (jyX 3 |g) = 1.) If we want to evaluate the above PDF between, say, 0 
and 1, we obtain fj \x 2 dx = (j^x 3 |J) = ^; that is, the probability that x lies between 0 
and 1 is 1 /27. 


FIGURE A.2 

Density function of a 
continuous random 
variable. 



3 Note: f° f(x) dx = 0. 
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Joint Probability Density Functions 

Discrete Joint PDF 

LetXand 7be two discrete random variables. Then the function 
f(x, y) = P(X = x and Y = y) 

ss 0 when X ± x and Y ± y 

is known as the discrete joint probability density function and gives the (joint) probabil¬ 
ity that X takes the value of x and Y takes the value of y. 


EXAMPLE 4 The following table gives the joint PDF of the discrete variables X and Y. 

X 

-2 0 2 3 

3 0.27 0.08 0.16 0 

y 

6 0 0.04 0.10 0.35 


This table tells us that the probability that X takes the value of -2 while Y simultaneously 
takes the value of 3 is 0.27 and that the probability that X takes the value of 3 while Y takes 
the value of 6 is 0.35, and so on. 


Marginal Probability Density Function 

In relation to f{x, y), fix) and fiy) are called individual, or marginal, probability den¬ 
sity functions. These marginal PDFs are derived as follows: 

/(*) = £ fix, V) marginal PDF of X 

fiy) = J2 /(*. y) marginal PDF of Y 

where, for example, means the sum over all values of Y and means the sum over all 
values of X. 


EXAMPLE 5 Consider the data given in Example 4. The marginal PDF of X is obtained as follows: 

f(x = -2) = J2 fix, y) = 0.27 + 0 = 0.27 
f(x = 0) = J2 f(x,Y) = 0 08 + 0.04 = 0.12 

Y 

fix = 2) = Y, y) = 0.16 + 0.10 = 0.26 
fix = 3) = £ fix, y) = 0 + 0.35 = 0.35 

Likewise, the marginal PDF of Y is obtained as 

fiy = 3) = J2 f ( x > y) = 0 27 + 0 08 + 0.16 + 0 = 0.51 

fiy =6) = ^, 7) = 0 + 0- 04 + 0.10 + 0.35 = 0.49 

As this example shows, to obtain the marginal PDF of X we add the column numbers, and 
to obtain the marginal PDF of Y we add the row numbers. Notice that ]TT fix) over all 
values of X is 1, as is f (y) over all values of Y (why?). 
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Conditional PDF 

As noted in Chapter 2, in regression analysis we are often interested in studying the behav¬ 
ior of one variable conditional upon the value(s) of another variable(s). This can be done by 
considering the conditional PDF. The function 

f(x\y) = P{X = x\Y = y) 

is known as the conditional PDF of X; it gives the probability that X takes on the value of 
x given that Fhas assumed the value y. Similarly, 

f(y\x) = P(Y = y\X = x) 

which gives the conditional PDF ofY. 

The conditional PDFs may be obtained as follows: 

f(x | y) = conditional PDF of X 

f(y ] x) = ^ conditional PDF of Y 

As the preceding expressions show, the conditional PDF of one variable can be expressed 
as the ratio of the joint PDF to the marginal PDF of another (conditioning) variable. 


EXAMPLE 6 Continuing with Examples 4 and 5, let us compute the following conditional probabilities: 

f(X = —2 Y = 31 

flX = -2 | Y = 3) = ^ ' -- = 0.27/0.51 = 0.53 

Notice that the unconditional probability f (X = —2) is 0.27, but if Y has assumed the 
value of 3, the probability that X takes the value of —2 is 0.53. 

flX = 2 1 Y = 6) = f{X = 2 ' Y j~ 6) = 0.10/0.49 = 0.20 
r(r = 6) 

Again note that the unconditional probability that X takes the value of 2 is 0.26, which is 
different from 0.20, which is its value if / assumes the value of 6. 


Statistical Independence 

Two random variables X and Y are statistically independent if and only if 
fix, y) = fix) fly) 

that is, if the joint PDF can be expressed as the product of the marginal PDFs. 


A bag contains three balls numbered 1, 2, and 3. Two balls are drawn at random, with 
replacement, from the bag (i.e., the first ball drawn is replaced before the second is 
drawn). Let X denote the number of the first ball drawn and Y the number of the second 
ball drawn. The following table gives the joint PDF of X and Y. 


EXAMPLE 7 







EXAMPLE 7 

( Continued ) 
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Y 2 
3 

Now f(X — 1, 7 = 1) = s, f(X = 1) = j (obtained by summing the first column), and 
f(y = 1) = 1 (obtained by summing the first row). Since f(X, Y ) = f(X)f(Y) in this 
example we can say that the two variables are statistically independent. It can be easily 
checked that for any other combination of X and Y values given in the above table the 
joint PDF factors into individual PDFs. 

It can be shown that the X and Y variables given in Example 4 are not statistically 
independent since the product of the two marginal PDFs is not equal to the joint PDF. 
(Note: f(X, Y)= f(X)f(Y ) must be true for all combinations of X and Y if the two 
variables are to be statistically independent.) 

Continuous Joint PDF 

The PDF f(x,y) of two continuous variables X and Y is such that 
f(x,y)> 0 

j™ j™ f(x,y)dxdy=\ 

f(x, y) dx dy = P(a <x<b,c<y<d ) 



EXAMPLE 8 


Consider the following PDF 

f(x, y) = 2- x - y 0 < x < 1; 0 < y < 1 
It is obvious that f(x, y) > 0. Moreover 4 


XT* 2 -"-*' 


- y) dx dy = 1 
The marginal PDF of X and Y can be obtained as 

f(x) = J f(x, y) dy marginal PDF of X 

f(y) = j f(x,y)dx marginal PDF of Y 


fo [id 2 - 2 - r> *] d, -fl K 2 ' - f - '»)[] dr 


Note: The expression (|/— y 2 /2)|J means the expression in the parentheses is to be evaluated at the 
upper limit value of 1 and the lower limit value of 0; the latter value is subtracted from the former to 
obtain the value of the integral. Thus, in the preceding example the limits are (1 — \) at y = 1 and 0 
at y = 0, giving the value of the integral as 1. 
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EXAMPLE 9 The two marginal PDFs of the joint PDF given in Example 8 are as follows: 


m = f fix, Y)dy =f\ 2-x- y)dy 

(ly-xy-^) I = |-x 0 < x < 1 


f(y)= [\2-x-y)dx 
Jo 

(ix-xy- y') =\~Y °5 y<1 

' ' lo 

To see if the two variables of Example 8 are statistically independent, we need to find out 
if f(x, y) = f(x)f(y). Since (2 — x — y) ± (§ — x)(§ — y), we can say that the two vari¬ 
ables are not statistically independent. 


A. 5 Characteristics of Probability Distributions 

A probability distribution can often be summarized in terms of a few of its characteristics, 
known as the moments of the distribution. Two of the most widely used moments are the 
mean, or expected value, and the variance. 

Expected Value 

The expected value of a discrete rvX, denoted by E(X), is defined as follows: 

E(X) = J2 x A x ) 

where means the sum over all values of X and where f(x) is the (discrete) PDF of X. 


EXAMPLE 10 Consider the probability distribution of the sum of two numbers in the throw of two dice 
given in Example 2. (See Figure A.1.) Multiplying the various X values given there by their 
probabilities and summing over all the observations, we obtain: 

£(X) = 2(dg) + +4 (^) + • • ■ + 12 (3 s) 


which is the average value of the sum of numbers observed in a throw of two dice. 


EXAMPLE 11 


Estimate £(X) and E(Y) for the data given in Example 4. We have seen that 
x -2 0 2 3 

f(x) 0.27 0.12 0.26 0.35 

Therefore, 

£(X) = ^xf(x) 

= (-2X0.27) + (0)(0.12) + (2)(0.26) + (3)(0.35) 

= 1.03 
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EXAMPLE 11 Similarly, 

(' Continued) 


y 3 6 

f(y) 0.51 0.49 


E(Y) = J2yf(y) 


= (3)(0.51) + (6)(0.49) 
= 4.47 


The expected value of a continuous rv is defined as 


£(X) = J°° xf(x)dx 


The only difference between this case and the expected value of a discrete rv is that we 
replace the summation symbol by the integral symbol. 


EXAMPLE 12 Let us find out the expected value of the continuous PDF given in Example 3. 



_ 9 
- 4 
= 2.25 


Properties of Expected Values 

1. The expected value of a constant is the constant itself. Thus, if b is a constant, E(b) = b. 

2. If a and b are constants, 

E(aX + b) = aE(X) + b 

This can be generalized. If Xi, Xj ,..., Xn are N random variables and a\, ct 2 ,..., a,y 
and b are constants, then 

E(a\X\ + (Z 2 X 2 + ■ ■ ■ + a^Xif + b) = a\E(X\) + aa^CXa) + ■ ■ • + a^E(X^) + b 

3. If X and Y are independent random variables, then 

E(XY) = E(X)E(Y) 

That is, the expectation of the product XY is the product of the (individual) expectations 
ofXand Y 

However, note that 



even if X and Y are independent. 
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EXAMPLE 13 


4. If Xis a random variable with PDF f(x) and if g(X) is any function of X, then 
£[g(X)] = J2 S(X)f(x) if X is discrete 


Thus, if g(X) = X 2 , 

E ( x2 ) = £* 2 /po 


= J°° g(X)f(x)dx 
= f°° x 2 f(X)dx 


if X is continuous 

if X is discrete 

if X is continuous 


Consider the following PDF: 


x -2 1 2 

f (x) | l § 

£(X) = -2(|) + l(l)+2(§) 

5 

- “8 

E(X 2 ) = 4(|)+i(1)+4(|) 


Variance 

Let Xbe a random variable and let E(X) = /i. The distribution, or spread, of the X values 
around the expected value can be measured by the variance, which is defined as 
var (X) = = E(X - ii) 2 

The positive square root of a\, a x , is defined as the standard deviation ofX The variance 
or standard deviation gives an indication of how closely or widely the individual X values 
are spread around their mean value. 

The variance defined previously is computed as follows: 


var (X) = J](X- M ) 2 /(x) 


if X is a discrete rv 


= f {X-n) 2 f{x)dx 


if X is a continuous rv 


For computational convenience, the variance formula given above can also be expressed as 
var (X) = a 2 = E(X - ft) 2 
= E(X 2 ) - ft 2 
= E{X 2 ) - [E(X)] 2 

Applying this formula, it can be seen that the variance of the random variable given in 
Example 13 is y — (-§) 2 = ^ = 3.23. 
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EXAMPLE 14 


Let us find the variance of the random variable given in Example 3. 

var(X) = £(X 2 ) — [£(X)] 2 


Now 


E(X 2 ) 



dx 


9 


= 243/45 
= 27/5 


Since E (X) = | (see Example 12), we finally have 

var(X) = 243/45 - (?) 2 
= 243/720 = 0.34 


Properties of Variance 

1. E(X — /f) 2 — E(X 2 ) — if 1 , as noted before. 

2. The variance of a constant is zero. 

3. If a and b are constants, then 

var (aX + b) = a 2 var(X) 

4. If X and Y are independent random variables, then 

var (X + Y) = var (X) + var ( Y) 
var (X-Y) = var (X) + var ( Y) 

This can be generalized to more than two independent variables. 

5. If X and Y are independent rv’s and a and b are constants, then 

var(aX + bY) = a 2 var(X) + b 2 var (7) 

Covariance 

Let X and The two rv’s with means /i x and p y , respectively. Then the covariance between 
the two variables is defined as 

cov (X, Y) = E{(X-p x )(Y - p y )} = E{XY) - p x p y 

It can be readily seen that the variance of a variable is the covariance of that variable with 
itself. 

The covariance is computed as follows: 

cov (x, y) = J2 - ^ (Y - y) 

= ^^X7/(r,y)- W , 
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if X and Y are discrete random variables, and 

cov {X, Y) = f f ( X - p x )(Y - n y )f(x, y ) dx dy 

-a: XY f(x, v) dx dy — /i x /i y 
if X and Y are continuous random variables. 

Properties of Covariance 

1. If X and Y are independent, their covariance is zero, for 

cov {X, Y) = E(XY) - [i x li y 

= li x lLy- p x p. y since E(XY) = E(X)E(Y ) = p, x fi y 
_ q when X and Y are independent 

2. 

cov (a + bX,c + dY) = bd cov (X, Y) 
where a, b, c, and d are constants. 


Let us find out the covariance between discrete random variables X and Y whose joint PDF 
is as shown in Example 4. From Example 11 we already know that pt x = £(X) = 1.03 and 
H Y = E(Y) = 4.47. 

E(XY) = J2T, XYf( ~ x ’y'> 

= (—2)(3)(0.27) + (0)(3)(0.08) + (2)(3)(0.16) + (3)(3)(0) 

+ (—2)(6)(0) + (0)(6)(0.04) + (2)(6)(0.10) + (3)(6)(0.35) 

= 6.84 


Therefore, 

cov(X,T) = E(XY)-ti x Hy 

= 6.84- (1.03)(4.47) 
= 2.24 


Correlation Coefficient 

The (population) correlation coefficient p (rho) is defined as 

cov(X, Y) cov(X, Y) 

P = , =- 

s/{var(X) var(7)} a x a y 

Thus defined, p is a measure of linear association between two variables and lies between 
— 1 and +1,-1 indicating perfect negative association and +1 indicating perfect positive 
association. 

From the preceding formula, it can be seen that 

cov {X, Y) = po x o y 
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EXAMPLE 16 Estimate the coefficient of correlation for the data of Example 4. 


From the PDFs given in Example 11 it can be easily shown that a x = 2.05 and 
o y = 1.50. We have already shown that cov(X, Y) — 2.24. Therefore, applying the pre¬ 
ceding formula we estimate p as 2.24/(2.05)(1.50) = 0.73. 


Variances of Correlated Variables 
Let X and 7 be two rv’s. Then 

var (X + 7) = var(X) + var (7) + 2 cov(X, 7) 

= var(X) + var (7) + 2pa x a y 
var (X - 7) = var(X) + var(7) - 2 cov(X, 7) 

- var(X) + var (7) — 2pa x a y 

If, however, X and 7 are independent, cov (X, 7) is zero, in which case the var (X + 7) and 
var(X — 7) are both equal to var (X) + var (7), as noted previously. 

The preceding results can be generalized as follows. Let Jfi=i = X\ + 
X 2 H- +X„, then the variance of the linear combination %t i s 



where Pij is the correlation coefficient between X, and Xj and where er, and o) are the stan¬ 
dard deviations of X, and Xj. 


Thus, 


var(Xi + X 2 + X 3 ) = var X\ + var X 2 + var X 2 + 2co\{X\, X 2 ) 


+ 2 cov (Xi ,X 3 ) + 2 cov (X 2 , X 3 ) 


= var X 1 + varX 2 + varX 3 + 2p\ 2 a x a 2 


+ 2p x3 a x a 3 + 2 p 2 3 a 2 cr 3 


where a x ,o 2 , and 03 are, respectively, the standard deviations of X t , X 2 , and X 3 and where 
P12 is the correlation coefficient between X\ and X 2 , p\ 3 that between X\ and X 3 , and p 23 
that between X 2 and X 3 . 

Conditional Expectation and Conditional Variance 

Let f(x, y ) be the joint PDF of random variables X and 7 The conditional expectation of X, 
given 7 = y, is defined as 


E(X \Y = y) = Y^ x f(x \y=y) ifXis discrete 

= f°° xf(x\Y = y)dx 


if X is continuous 
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EXAMPLE 17 


where E(X \ 7 = y) means the conditional expectation of X given 7 = y and where 
f(x | Y = y) is the conditional PDF ofX. The conditional expectation of Y, E{Y \ X = x), 
is defined similarly. 

Conditional Expectation 

Note that E(X 7) is a random variable because it is a function of the conditioning variable 
7. However, E(X Y = y), where y is a specific value of 7, is a constant. 

Conditional Variance 

The conditional variance of X given Y — y is defined as 


var(X | 7 = y) = E{[X - E(X \ Y = y)] 2 \ Y = y} 

= Yj x ~ E ( x I Y = I Y = y) if X is discrete 


= [X-E(X\Y = y)ff(x\Y=y)dx 


ifXis continuous 


Compute E(Y | X = 2) and var(7 | X = 2) for the data given in Example 4. 
E(7|™ 2) = J>f(K = yi*=2) 

= 3f(Y = 3 | X = 2) + 6f(Y = 6 \ X = 2) 

= 3(0.16/0.26) + 6(0.10/0.26) 

= 4.15 

Note: f(Y = 3 |X = 2) = f(Y = 3, X = 2)/f(X = 2) = 0.16/0.26, and 
f(Y = 61 X =2)= f(Y = 6, X = 2)/f(X = 2) = 0.10/0.26, so 

var(T | X = 2) = ^[7 - E(Y \ X = 2)] 2 f(7 | X = 2) 

= (3 - 4.15) 2 (0.16/0.26) + (6 - 4.15) 2 (0.10/0.26) 
= 2.13 


Properties of Conditional Expectation and Conditional Variance 

1. If f{X) is a function of X, then E(f(X) \ X) — f{X), that is, the function of X be¬ 
haves as a constant in computation of its expectation conditional on X. Thus, 
[£'(X 3 ] X)] — E(X 3 ); this is because, if Xis known, X 3 is also known. 

2. If f(X) and g(X) are functions ofX, then 

E[f(X)Y + g(X) | X] = f(X)E(Y | X) + g(X) 

For example, E[XY + cX 2 \ X] = XE(Y \ X) + cX 2 , where c is a constant. 

3. If X and 7 are independent, E(Y | X) = E{Y). That is, if X and 7 are independent 
random variables, then the conditional expectation of 7, given X, is the same as the 
unconditional expectation of 7. 
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4. The law of iterated expectations. It is interesting to note the following relation between 
the unconditional expectation of a random variable Y, E(Y), and its conditional expecta¬ 
tion based on another random variable X, E(Y \ X ): 

E(Y) = E X [E(Y \X)] 

This is known as the law of iterated expectations, which in the present context states that 
the marginal, or unconditional, expectation of Y is equal to the expectation of its condi¬ 
tional expectation, the symbol E x denoting that the expectation is taken over the values 
of X. Put simply, this law states that if we first obtain E(Y \ X) as a function of X and take 
its expected value over the distribution of X values, you wind up with E(Y), the uncondi¬ 
tional expectation of Y. The reader can verify this using the data given in Example 4. 

An implication of the law of iterated expectations is that if the conditional mean of Y 
given X (i.e., E[Y\X\) is zero, then the (unconditional) mean of V is also zero. This 
follows immediately because in that case 

E[E(Y |AQ] = £[0] = 0 

5. If X and Y are independent, then var ( Y \ X) — var (T). 

6. var(7) = E[var(71 X)] + var [E(Y \ X)]; that is, the (unconditional) variance of Y is 
equal to expectation of the conditional variance of Y plus the variance of the conditional 
expectation of Y. 

Higher Moments of Probability Distributions 

Although mean, variance, and covariance are the most frequently used summary measures 
of univariate and multivariate PDFs, we occasionally need to consider higher moments of 
the PDFs, such as the third and the fourth moments. The third and fourth moments of a 
univariate PDF f(x) around its mean value (/z) are defined as 

Third moment: E(X - /z) 3 

Fourth moment: E(X — /z) 4 

In general, the rth moment about the mean is defined as 

rth moment: E(X — fi) r 

The third and fourth moments of a distribution are often used in studying the “shape” 
of a probability distribution, in particular, its skewness, S (i.e., lack of symmetry) and 
kurtosis, K (i.e., tallness or flatness), as shown in Figure A.3. 

One measure of skewness is defined as 

P E(X — /z) 3 
a 3 

third moment about the mean 
cube of the standard deviation 

A commonly used measure of kurtosis is given by 
F(A-/z) 4 
[E{X-txff 

fourth moment about the mean 


square of the second moment 
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FIGURE A.3 

(a) Skewness; 

( b ) kurtosis. 




PDFs with values of K less than 3 are called platykurtic (fat or short-tailed), and those 
with values greater than 3 are called leptokurtic (slim or long-tailed). See Figure A.3. A 
PDF with a kurtosis value of 3 is known as mesokurtic, of which the normal distribution 
is the prime example. (See the discussion of the normal distribution in Section A.6.) 

We will show shortly how the measures of skewness and kurtosis can be combined 
to determine whether a random variable follows a normal distribution. Recall that our 
hypothesis-testing procedure, as in the t and F tests, is based on the assumption (at least in 
small or finite samples) that the underlying distribution of the variable (or sample statistic) 
is normal. It is therefore very important to find out in concrete applications whether this 
assumption is fulfilled. 


A.6 Some Important Theoretical Probability Distributions 


In the text extensive use is made of the following probability distributions. 


Normal Distribution 


The best known of all the theoretical probability distributions is the normal distribution, 
whose bell-shaped picture is familiar to anyone with a modicum of statistical knowledge. 

A (continuous) random variable X is said to be normally distributed if its PDF has the 
following form: 


m 


= ex p (- 

oV2jr \ 


l (*~/x) 2 \ 
2 ^ ) 


— OO < X < oo 
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FIGURE A.4 

Areas under the 
normal curve. 



where p, and a 1 , known as the parameters of the distribution, are, respectively, the mean 
and the variance of the distribution. The properties of this distribution are as follows: 

1. It is symmetrical around its mean value. 

2. Approximately 68 percent of the area under the normal curve lies between the values of 
p ± a, about 95 percent of the area lies between p ± 2tr , and about 99.7 percent of the 
area lies between p ± 3<r, as shown in Figure A.4. 

3. The normal distribution depends on the two parameters p and a 2 , so once these are 
specified, one can find the probability that A will lie within a certain interval by using the 
PDF of the normal distribution. But this task can be lightened considerably by referring 
to Table D. 1 of Appendix D. To use this table, we convert the given normally distributed 
variable A with mean p and a 2 into a standardized normal variable Zby the following 
transformation: 



An important property of any standardized variable is that its mean value is zero and its 
variance is unity. Thus Z has zero mean and unit variance. Substituting z into the normal 
PDF given previously, we obtain 



which is the PDF of the standardized normal variable. The probabilities given in 
Appendix D, Table D.l, are based on this standardized normal variable. 

By convention, we denote a normally distributed variable as 

A - N(p, xt 2 ) 

where ~ means “distributed as,” N stands for the normal distribution, and the quantities 
in the parentheses are the two parameters of the normal distribution, namely, the mean 
and the variance. Following this convention, 

A ~ N( 0, 1) 

means A is a normally distributed variable with zero mean and unit variance. In other 
words, it is a standardized normal variable Z. 
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EXAMPLE 18 

Assume that X ~ N(8,4). What is the probability that X will assume a value between 

Xi = 4 and X 2 = 12? To compute the required probability, we compute the Z values as 

Zl _ -/•* _ ixl - _ 2 

Z2= X^x = 12-8 =+2 

Now from Table D.l we observe that Pr(0 < Z < 2) = 0.4772. Then, by symmetry, 
we have Pr(-2 < Z < 0) = 0.4772. Therefore, the required probability is 0.4772 + 
0.4772 = 0.9544. (See Figure A.4.) 



EXAMPLE 19 

What is the probability that in the preceding example X exceeds 12? 

The probability that X exceeds 12 is the same as the probability that Z exceeds 2. From 
Table D.l it is obvious that this probability is (0.5 - 0.4772) or 0.0228. 


4. Let X\ ~ N(ji] , of) and X 2 ~ N(fi 2 , of) and assume that they are independent. Now 
consider the linear combination 


Y = aX i + bX 2 


where a and b are constants. Then it can be shown that 


Y ~ N[(an i + bin), ( a 2 a\ + b 2 of)] 

This result, which states that a linear combination of normally distributed variables is 
itself normally distributed, can be easily generalized to a linear combination of more 
than two normally distributed variables. 

5. Central limit theorem. Let X\, X 2 ,..., X n denote n independent random variables, all 
of which have the same PDF with mean = ft and variance — a 2 . Let X — X t /n (i.e., 
the sample mean). Then as n increases indefinitely (i.e., n —> oo), 


That is,X approaches the normal distribution with mean // and variance a 2 /n. Notice 
that this result holds true regardless of the form of the PDF. As a result, it follows that 


X-n _ *Jn(X — u) 
o/Jn a 


N(0, 1) 


That is, Z is a standardized normal variable. 

6. The third and fourth moments of the normal distribution around the mean value are as 
follows: 

Third moment: E(X — /x) 3 = 0 

Fourth moment: E(X — /x) 4 = 3o 4 

Note: All odd-powered moments about the mean value of a normally distributed variable 
are zero. 

7. As a result, and following the measures of skewness and kurtosis discussed earlier, for a 
normal PDF skewness = 0 and kurtosis = 3; that is, a normal distribution is symmetric 







Appendix A A Review of Some Statistical Concepts 819 


and mesokurtic. Therefore, a simple test of normality is to find out whether the com¬ 
puted values of skewness and kurtosis depart from the norms of 0 and 3. This is in fact 
the logic underlying the Jarque-Bera (JB) test of normality discussed in the text: 



(5.12.1) 


where S stands for skewness and K for kurtosis. Under the null hypothesis of normality, 
JB is distributed as a chi-square statistic with 2df. 

8. The mean and the variance of a normally distributed random variable are independent in 
that one is not a function of the other. 

9. If X and Y are jointly normally distributed, then they are independent if, and only if, the 
covariance between them [i.e., cov {X, Y)\ is zero. (See Exercise 4.1.) 

The x 2 (Chi-Square) Distribution 

Let Z\, Z2,..., Zk be independent standardized normal variables (i.e., normal variables 

with zero mean and unit variance). Then the quantity 



is said to possess the x 2 distribution with k degrees of freedom (df), where the term df means 
the number of independent quantities in the previous sum. A chi-square-distributed variable 
is denoted by x*, where the subscript k indicates the df. Geometrically, the chi-square distri¬ 
bution appears in Figure A.5. 

Properties of the x 2 distribution are as follows: 

1. As Figure A.5 shows, the x 2 distribution is a skewed distribution, the degree of the 
skewness depending on the df. For comparatively few df, the distribution is highly 
skewed to the right; but as the number of df increases, the distribution becomes increas¬ 
ingly symmetrical. As a matter of fact, for df in excess of 100, the variable 



can be treated as a standardized normal variable, where k is the df. 


FIGURE A.5 

Density function of the 
X 2 variable. 



fix 2 ) 


0 
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2. The mean of the chi-square distribution is k, and its variance is 2k, where k is the df. 

3. If Z\ and Z2 are two independent chi-square variables with k\ and ki df, then the sum 
Z\ + Z2 is also a chi-square variable with df—k\ +k 2 . 


EXAMPLE 20 What is the probability of obtaining a / 2 value of 40 or greater, given the df of 20? 


As Table D.4 shows, the probability of obtaining a x 2 value of 39.9968 or greater 
(20 df) is 0.005. Therefore, the probability of obtaining a x 2 value of 40 or greater is less 
than 0.005, a rather small probability. 


Student's t Distribution 

If Z\ is a standardized normal variable [that is, Z\ ~ N( 0,1)] and another variable Z 2 fol¬ 
lows the chi-square distribution with k df and is distributed independently of Z\, then the 
variable defined as 



t = 


ZiVk 
•v/Z2 


follows Student’s t distribution with k df. A f-distributed variable is often designated as 4, 
where the subscript k denotes the df. Geometrically, the t distribution is shown in Fig¬ 
ure A.6. 

Properties of the Student’s t distribution are as follows: 

1. As Figure A.6 shows, the t distribution, like the normal distribution, is symmetrical, but 
it is flatter than the normal distribution. But as the df increase, the t distribution approx¬ 
imates the normal distribution. 

2. The mean of the t distribution is zero, and its variance is k/(k — 2). 

The t distribution is tabulated in Table D.2. 


EXAMPLE 21 Given df = 13, what is the probability of obtaining a t value (a) of about 3 or greater, (b) of 


about —3 or smaller, and (c) of 11| of about 3 or greater, where 11| means the absolute 
value (i.e., disregarding the sign) of t? 

From Table D.2, the answers are (a) about 0.005, (b) about 0.005 because of the 
symmetry of the t distribution, and (c) about 0.01 = 2(0.005). 


FIGURE A.6 

Student’s t distribution 
for selected degrees of 
freedom. 



0 
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FIGURE A.7 

F distribution for 
various degrees of 
freedom. 


m 



The F Distribution 

If Z\ and Z 2 are independently distributed chi-square variables with k\ and k 2 df, respec¬ 
tively, the variable 

F _ Zi/h 
Z 2 /k 2 

follows (Fisher’s) F distribution with k\ and k 2 df. An F-distributed variable is denoted by 
Fk u k 2 where the subscripts indicate the df associated with the two Z variables, k\ being 
called the numerator df and k 2 the denominator df. Geometrically, the F distribution is 
shown in Figure A.7. 

The F distribution has the following properties: 

1. Like the chi-square distribution, the F distribution is skewed to the right. But it can 
be shown that as k\ and k 2 become large, the F distribution approaches the normal 
distribution. 

2. The mean value of an F-distributed variable is ^2/(^2 — 2), which is defined for k 2 > 2, 
and its variance is 

2k\(k\ +k 2 -2) 
k\ (k 2 - 2 )Hk 2 - 4) 


which is defined for k 2 > 4. 

3. The square of a /-distributed random variable with k df has an F distribution with 1 and 
k df. Symbolically, 

4 2 = p i,k 


EXAMPLE 22 Given ki = 10 and k 2 = 8, what is the probability of obtaining an F value (a) of 3.4 or 
greater and (b) of 5.8 or greater? 

As Table D.3 shows, these probabilities are (a) approximately 0.05 and (b) approxi¬ 
mately 0.01. 
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4. If the denominator df, k 2 , is fairly large, the following relationship holds between the F 
and the chi-square distributions: 


That is, for fairly large denominator df, the numerator df times the F value is approxi¬ 
mately the same as a chi-square value with numerator df. 


EXAMPLE 23 Let fei = 20 and k 2 = 120. The 5 percent critical F value for these df is 1.48. Therefore, 


/ci F = (20)(1.48) = 29.6. From the chi-square distribution for 20 df, the 5 percent critical 
chi-square value is about 31.41. 


In passing, note that since for large df the t, chi-square, and F distributions approach the 
normal distribution, these three distributions are known as the distributions related to the 
normal distribution. 

The Bernoulli Binomial Distribution 

A random variable X is said to follow a distribution named after Bernoulli (a Swiss mathe¬ 
matician) if its probability density (or mass) function (PDF) is: 


P(X = 0) = 1 - p 
P(X=\) = p 


where p, 0 < p < 1, is the probability that some event is a “success,” such as the probabil¬ 
ity of obtaining a head in a toss of a coin. For such a variable, 

E(X) = [1 x p(X = 1) + 0 x p(X = 0)] = p 
var (X) = pq 

where q — (1 — p), that is, the probability of a “failure.” 

Binomial Distribution 

The binomial distribution is the generalization of the Bernoulli distribution. Let n denote 
the number of independent trials, each of which results in a “success” with probability p 
and a “failure” with a probability q = (1 — p). If A represents the number of successes in 
the n trials, then X is said to follow the binomial distribution whose PDF is: 



where x represents the number of successes in n trials and where 



where n\, read as n factorial, means n(n — \ )(n — 2) ■ ■ ■ 1. 

The binomial is a two-parameter distribution, n and p. For this distribution, 


E(X) = np 

var (A) = np(l- p) = npq 
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For example, if you toss a coin 100 times and want to find out the probability of obtaining 
60 heads, you put p = 0.5, n = 100 and x = 60 in the above formula. Computer routines 
exist to evaluate such probabilities. 

You can see how the binomial distribution is a generalization of the Bernoulli 
distribution. 

The Poisson Distribution 

A random X variable is said to have the Poisson distribution if its PDF is: 

f{X)= e —^~ forx = 0,1,2,..., X > 0 

The Poisson distribution depends on a single parameter, X. A distinguishing feature of the 
Poisson distribution is that its variance is equal to its expected value, which is X. That is, 
E(X) — var(Y) = X 

The Poisson model, as we saw in the chapter on nonlinear regression models, is used to 
model rare or infrequent phenomena, such as the number of phone calls received in a span 
of, say, 5 minutes, or the number of speeding tickets received in a span of an hour, or the 
number of patents received by a firm, say, in a year. 

A.7 Statistical Inference: Estimation 


In Section A.6 we considered several theoretical probability distributions. Very often we 
know or are willing to assume that a random variable X follows a particular probability dis¬ 
tribution but do not know the value(s) of the parameter(s) of the distribution. For example, 
if X follows the normal distribution, we may want to know the value of its two parameters, 
namely, the mean and the variance. To estimate the unknowns, the usual procedure is to 
assume that we have a random sample of size n from the known probability distribution and 
use the sample data to estimate the unknown parameters. 5 This is known as the problem of 
estimation. In this section, we take a closer look at this problem. The problem of estima¬ 
tion can be broken down into two categories: point estimation and interval estimation. 

Point Estimation 

To fix the ideas, let Abe a random variable with PDF f(x; 9), where 9 is the parameter of 
the distribution (for simplicity of discussion only, we are assuming that there is only one 
unknown parameter; our discussion can be readily generalized). Assume that we know the 
functional form—that is, we know the theoretical PDF, such as the t distribution—but do 
not know the value of 9. Therefore, we draw a random sample of size n from this known 
PDF and then develop a function of the sample values such that 

9 = f(x i,x 2 , 

provides us an estimate of the true 9.9 is known as a statistic, or an estimator, and a par¬ 
ticular numerical value taken by the estimator is known as an estimate. Note that 9 can be 

5 Let Xi, X2, ■ ■ X n be n random variables with joint PDF f(x1, X2, ..., x n ). If we can write 
f (xi, x 2 ,..., x„) = f(x 1 )f(x 2 ) - f{x n ) 

where f (x) is the common PDF of each X, then x\, X2,..., x„ are said to constitute a random sample 
of size n from a population with PDF f(x„). 
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treated as a random variable because it is a function of the sample data. 9 provides us with 
a rule, or formula, that tells us how we may estimate the true 9. Thus, if we let 


9 = — (Xi + X 2 + • • • + x n ) = X 


where X is the sample mean, then X is an estimator of the true mean value, say, /z. If in a 
specific case X =50, this provides an estimate of /z. The estimator 9 obtained previously is 
known as a point estimator because it provides only a single (point) estimate of 9. 

Interval Estimation 

Instead of obtaining only a single estimate of 9, suppose we obtain two estimates of 9 by 
constructing two estimators 9\{x\, Xi ,..., x n ) and 9i{x \, xi ,..., x n ), and say with some 
confidence (i.e., probability) that the interval between 9\ and §2 includes the true 9. Thus, 
in interval estimation, in contrast with point estimation, we provide a range of possible 
values within which the true 9 may lie. 

The key concept underlying interval estimation is the notion of the sampling, or 
probability distribution, of an estimator. For example, it can be shown that if a variable 
X is normally distributed, then the sample mean X is also normally distributed with 
mean = /z (the true mean) and variance — a 2 /n, where n is the sample size. In other words, 
the sampling, or probability, distribution of the estimator X isX ~ N(fi, a 2 /«). As a 
result, if we construct the interval 


and say that the probability is approximately 0.95, or 95 percent, that intervals like it will in¬ 
clude the true /z, we are in fact constructing an interval estimator for /z. Note that the interval 
given previously is random since it is based on X, which will vary from sample to sample. 

More generally, in interval estimation we construct two estimators 6 \ and § 2 , both 
functions of the sample X values, such that 


Pr(0i < 9 < § 2 ) = 1 -a 0 < a < 1 


That is, we can state that the probability is 1 — a that the interval from 0\ to 62 contains the 
true 9. This interval is known as a confidence interval of size 1 — a for 9, 1 — a being 
known as the confidence coefficient. If a = 0.05, then 1 — a = 0.95, meaning that if we 
construct a confidence interval with a confidence coefficient of 0.95, then in repeated such 
constructions resulting from repeated sampling we shall be right in 95 out of 100 cases if 
we maintain that the interval contains the true 9. When the confidence coefficient is 0.95, 
we often say that we have a 95 percent confidence interval. In general, if the confidence 
coefficient is 1 — a, we say that we have a 100(1 — a)% confidence interval. Note that a is 
known as the level of significance, or the probability of committing a Type I error. This 
topic is discussed in Section A. 8. 


EXAMPLE 24 Suppose that the distribution of height of men in a population is normally distributed with 


mean = pt inches and 0 = 2.5 inches. A sample of 100 men drawn randomly from this 
population had an average height of 67 inches. Establish a 95 percent confidence interval 
for the mean height (= pt) in the population as a whole. 

As noted, X ~ N(pt,cr 2 /ri), which in this case becomes X ~ N(pt, 2.5 2 /100). From 
Table D.1 one can see that 
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EXAMPLE 24 

( Continued) 

covers 95 percent of the area under the normal curve. Therefore, the preceding interval 
provides a 95 percent confidence interval for /z. Plugging in the given values of X, a , and 
n, we obtain the 95 percent confidence interval as 

66.51 < fi < 67.49 

In repeated such measurements, intervals thus established will include the true /z with 

95 percent confidence. A technical point may be noted here. Although we can say that the 
probability that the random interval [ X ± 1 ,96(cr/*/n)] includes g is 95 percent, we cannot 
say that the probability is 95 percent that the particular interval (66.51, 67.49) includes g. 
Once this interval is fixed, the probability that it will include g is either 0 or 1. What we can 
say is that if we construct 100 such intervals, 95 out of the 100 intervals will include the 
true /z; we cannot guarantee that one particular interval will necessarily include g. 


Methods of Estimation 

Broadly speaking, there are three methods of parameter estimation: (1) least squares (LS), 
(2) maximum likelihood (ML), and (3) method of moments (MOM) and its extension, the 
generalized method of moments (GMM). We have devoted considerable time to illustrate 
the LS method. In Chapter 4 we introduced the ML method in the regression context. But 
the method is of much broader application. 

The key idea behind the ML is the likelihood function. To illustrate this, suppose the 
random variable X has PDF f{X,9) which depends on a single parameter 9. We know the 
PDF (e.g., Bernoulli or binomial) but do not know the parameter value. Suppose we obtain 
a random sample of nX values. The joint PDF of these n values is: 

g(x u x 2 , 

Because it is a random sample, we can write the preceding joint PDF as a product of the 
individual PDF as 

g(x u X 2 ,..., x n ; e ) = fixr, 0)/(x 2 ; o )• • • /(*„; e ) 

The joint PDF has a dual interpretation. If 9 is known, we interpret it as the joint probability 
of observing the given sample values. On the other hand, we can treat it as a function of 9 
for given values of x\, x 2 ,..., x n . On the latter interpretation, we call the joint PDF the 

likelihood function (LF) and write it as 

L{9-x u Xu..., Xn) = f{x i; 9)f{x 2 ; 9) ■ ■ • /(*„; 9) 

Observe the role reversal of 9 in the joint probability density function and the likelihood 
function. 

The ML estimator of 9 is that value of 9 that maximizes the (sample) likelihood func¬ 
tion, L. For mathematical convenience, we often take the log of the likelihood, called 
the log-likelihood function (log L). Following the calculus rules of maximization, we 
differentiate the log-likelihood function with respect to the unknown and equate the 
resulting derivative to zero. The resulting value of the estimator is called the maximum- 
likelihood estimator. One can apply the second-order condition of maximization to 
assure that the value we have obtained is in fact the maximum value. 

In case there is more than one unknown parameter, we differentiate the log-likelihood 
function with respect to each unknown, set the resulting expressions to zero, and solve 
them simultaneously to obtain the values of the unknown parameters. We have already 
shown this for the multiple regression model (see Chapter 4, Appendix 4A. 1). 
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EXAMPLE 25 


Assume that the random variable X follows the Poisson distribution with the mean value 
of X. Suppose xi, *2/ • • •/ *n are independent Poisson random variables each with mean X. 
Suppose we want to find out the ML estimator of X. The likelihood function here is: 


L(xi,x 2 , ...,x n ;X) = 


e~ k X x ' e~ x X X2 


This is a rather unwieldy expression, but if we take its log, it becomes 

log (x 1( *2, • • •, x„; X) = —nX + ^ X; log X - log c 

where log c = n M- Differentiating the preceding expression with respect to X, we obtain 
(-Q±. 0 Xi )/X). By setting this last expression to zero, we obtain X m / = (£x;)/n = 
which is the ML estimator of the unknown X. 


The Method of Moments 

We have given a glimpse of MOM in Exercise 3.4 in the so-called analogy principle in 
which the sample moments try to duplicate the properties of their population counterparts. 
The generalized method of moments (GMM), which is a generalization of MOM, is now 
becoming more popular, but not at the introductory level. Hence we will not pursue it here. 

The desirable statistical properties fall into two categories: small-sample, or finite- 
sample, properties and large-sample, or asymptotic, properties. Underlying both of these 
sets of properties is the notion that an estimator has a sampling, or probability, distribution. 

Small-Sample Properties 

Unbiasedness 

An estimator 9 is said to be an unbiased estimator of 9 if the expected value of 9 is equal to 
the true 9; that is, 

E(9) = 9 


or 

E{9) -9 = 0 

If this equality does not hold, then the estimator is said to be biased, and the bias is 
calculated as 

bias(0) = E{9) - 9 

Of course, if E(9) = 9 —that is, 9 is an unbiased estimator—the bias is zero. 

Geometrically, the situation is as depicted in Figure A.8. In passing, note that unbiased¬ 
ness is a property of repeated sampling, not of any given sample: Keeping the sample size 
fixed, we draw several samples, each time obtaining an estimate of the unknown parameter. 
The average value of these estimates is expected to be equal to the true value if the estima¬ 
tor is unbiased. 

Minimum Variance 

9\ is said to be a minimum-variance estimator of 9 if the variance of Q\ is smaller than or at 
most equal to the variance of 62 , which is any other estimator of 9. Geometrically, we have 
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FIGURE A.8 

Biased and unbiased 
estimators. 



E$ 1 )= 6 E(f) 2 )*e 


FIGURE A.9 

Distribution of three 
estimators of 9. 



Figure A.9, which shows three estimators of 9, namely § 1 , 62 , and § 3 , and their probability 
distributions. As shown, the variance of 83 is smaller than that of either 8 \ or 6b. Hence, 
assuming only the three possible estimators, in this case 83 is a minimum-variance 
estimator. But note that 83 is a biased estimator (why?). 

Best Unbiased, or Efficient, Estimator 

If 8 \ and 82 are two unbiased estimators of 9, and the variance of 8 \ is smaller than or 
at most equal to the variance of 9 2 , then 8 \ is a minimum-variance unbiased, or best 
unbiased, or efficient, estimator. Thus, in Figure A.9, of the two unbiased estimators 9\ 
and § 2 , 9\ is best unbiased, or efficient. 

Linearity 

An estimator 9 is said to be a linear estimator of 9 if it is a linear function of the sample 
observations. Thus, the sample mean defined as 

X — - ^2,Xi = -(xi +x 2 -t-hx„) 

n ' n 

is a linear estimator because it is a linear function of the X values. 

Best Linear Unbiased Estimator (BLUE) 

If 8 is linear, is unbiased, and has minimum variance in the class of all linear unbiased 
estimators of 9, then it is called a best linear unbiased estimator, or BLUE for short. 

Minimum Mean-Square-Error (MSE) Estimator 
The MSE of an estimator 9 is defined as 

MSE(0) = E(9 - 9 ) 2 








828 Appendix A A Review of Some Statistical Concepts 


This is in contrast with the variance of 9, which is defined as 
var(<?) = E[9 - E(§)] 2 

The difference between the two is that var (6) measures the dispersion of the distribution of 
9 around its mean or expected value, whereas MSE(0) measures dispersion around the true 
value of the parameter. The relationship between the two is as follows: 

MSE(0) = E(§ - Of 

= E[§ - E(9) + E{9) - 9] 2 

= E[§ - E(9)] 2 + E[E(§) - 6] 2 + 2E[§ - E0)][E(§) - 9] 

= E[9 - E(9)] 2 + E[E(9) - 0] 2 since the last term is zero 6 
= var (0) + bias(0) 2 
= variance of 9 plus square bias 

Of course, if the bias is zero, MSE(0) = var(/9). 

The minimum MSE criterion consists in choosing an estimator whose MSE is the least 
in a competing set of estimators. But notice that even if such an estimator is found, there is 
a tradeoff involved—to obtain minimum variance you may have to accept some bias. Geo¬ 
metrically, the situation is as shown in Figure A. 10. In this figure, §2 is slightly biased, but 
its variance is smaller than that of the unbiased estimator 9\. In practice, however, the min¬ 
imum MSE criterion is used when the best unbiased criterion is incapable of producing 
estimators with smaller variances. 

Large-Sample Properties 

It often happens that an estimator does not satisfy one or more of the desirable statistical 
properties in small samples. But as the sample size increases indefinitely, the estimator 
possesses several desirable statistical properties. These properties are known as the large- 
sample, or asymptotic, properties. 


FIGURE A.10 

Tradeoff between bias 
and variance. 



6 The last term can be written as 2{[£(0)] 2 - [£ (0)] 2 - 6 E(§) + 0 E (0)1 = 0. Also note that 
E[E (0) — 9] 2 = [E 0) — e] 2 , since the expected value of a constant is simply the constant itself. 
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Asymptotic Unbiasedness 

An estimator 0 is said to be an asymptotically unbiased estimator of 0 if 
lim E{0 n ) = 6 

where 0„ means that the estimator is based on a sample size of n and where lim means limit 
and n —»■ oo means that n increases indefinitely. In words, 9 is an asymptotically unbiased 
estimator of 9 if its expected, or mean, value approaches the true value as the sample size 
gets larger and larger. As an example, consider the following measure of the sample 
variance of a random variable X\ 

S 2_ W-X) 2 

n 

It can be shown that 

E(S 2 ) = n 2 (l - ^ 

where a 2 is the true variance. It is obvious that in a small sample S 2 is biased, but as n 
increases indefinitely, E(S 2 ) approaches true er 2 ; hence it is asymptotically unbiased. 

Consistency 

9 is said to be a consistent estimator if it approaches the true value 9 as the sample size gets 
larger and larger. Figure A. 11 illustrates this property. 

In this figure we have the distribution of 9 based on sample sizes of25, 50, 80, and 100. 
As the figure shows, 9 based on n = 25 is biased since its sampling distribution is not 
centered on the true 9. But as n increases, the distribution of 9 not only tends to be more 
closely centered on 9 (i.e., 9 becomes less biased) but its variance also becomes smaller. 
If in the limit (i.e., when n increases indefinitely) the distribution of 9 collapses to the single 
point 9, that is, if the distribution of 9 has zero spread, or variance, we say that 0 is a 
consistent estimator of 9. 


FIGUREA.il 

The distribution of 9 
as sample size 
increases. 
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More formally, an estimator 0 is said to be a consistent estimator of 0 if the probability that 
the absolute value of the difference between 9 and 9 is less than 5 (an arbitrarily small posi¬ 
tive quantity) approaches unity. Symbolically, 

lim P{\9 — 9\ < 5} = 1 5 > 0 

where P stands for probability. This is often expressed as 
plim 9=9 

where plim means probability limit. 

Note that the properties of unbiasedness and consistency are conceptually very differ¬ 
ent. The property of unbiasedness can hold for any sample size, whereas consistency is 
strictly a large-sample property. 

A sufficient condition for consistency is that the bias and variance both tend to zero as 
the sample size increases indefinitely. 7 Alternatively, a sufficient condition for consistency 
is that the MSE(0) tends to zero as n increases indefinitely. (For MSE[0], see the discussion 
presented previously.) 


EXAMPLE 26 Let Xi, X 2 , ■ ■ ., X„ be a random sample from a distribution with mean /x and variance a * 1 2 3 . 
Show that the sample mean X is a consistent estimator of /x. 

From elementary statistics it is known that £(X) = /x and var(X) = o 2 /n. Since 
E (X) = /x regardless of the sample size, it is unbiased. Moreover, as n increases indefinitely, 
var(X) tends toward zero. Hence, X is a consistent estimator of/x. 


The following rules about probability limits are noteworthy. 

1. Invariance (Slutsky property). If 9 is a consistent estimator of 9 and if h(9) is any con¬ 
tinuous function of 9, then 

plimA(0) = h(9) 

What this means is that if 9 is a consistent estimator of 9, then 1/9 is also a consistent 
estimator of 1 /9 and that log (9) is also a consistent estimator of log (9). Note that this 
property does not hold true of the expectation operator E; that is, if 9 is an unbiased 
estimator of 9 (that is, £[0] = 9), it is not true that 1 /0 is an unbiased estimator of 1/9; 
that is, £(1/0) / 1 /E{9) + 1/0. 

2. If b is a constant, then 

plim b = b 

That is, the probability limit of a constant is the same constant. 

3. If 9\ and 02 are consistent estimators, then 

plim (0i + 0 2 ) — plim + plim § 2 
plim (9i 9 2 ) = plim 0i plim0 2 



7 More technically, lim^oo E0 n ) = 8 and limbec var 


■0n) = O. 
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The last two properties, in general, do not hold true of the expectation operator E. Thus, 
E{9\/9 2 ) ± E0i)/E(e 2 ). Similarly, E{9A) / E0i)E(§ 2 ). If, however, 9\ and 0 2 are 
independently distributed, E(6\9 2 ) = E{9\)E{9 2 ), as noted previously. 

Asymptotic Efficiency 

Let 6 be an estimator of 9. The variance of the asymptotic distribution of 9 is called the 
asymptotic variance of 9. If 9 is consistent and its asymptotic variance is smaller than 
the asymptotic variance of all other consistent estimators of 9, 9 is called asymptotically 
efficient. 

Asymptotic Normality 

An estimator 9 is said to he asymptotically normally distributed if its sampling distribution 
tends to approach the normal distribution as the sample size n increases indefinitely. For 
example, statistical theory shows that if X\, X 2 ,..., X n are independent normally distrib¬ 
uted variables with the same mean p and the same variance a 2 , the sample mean X is also 
normally distributed with mean p and variance a 2 /n in small as well as large samples. But 
if the Xi are independent with mean p and variance a 2 but are not necessarily from the 
normal distribution, then the sample mean X is asymptotically normally distributed with 
mean p and variance <t 2 /m; that is, as the sample size n increases indefinitely, the sample 
mean tends to be normally distributed with mean p and variance o 2 /n. That is in fact the 
central limit theorem discussed previously. 

A.8 Statistical Inference: Hypothesis Testing 

Estimation and hypothesis testing constitute the twin branches of classical statistical infer¬ 
ence. Having examined the problem of estimation, we briefly look at the problem of testing 
statistical hypotheses. 

The problem of hypothesis testing may be stated as follows. Assume that we have an rv X 
with a known PDF fix; 9), where 9 is the parameter of the distribution. Having obtained a 
random sample of size n, we obtain the point estimator 9. Since the true 9 is rarely known, 
we raise the question: Is the estimator 9 “compatible” with some hypothesized value of 9, 
say, 9=9*, where 9* is a specific numerical value of 91 In other words, could our sample 
have come from the PDF f(x; 9) =9*1 In the language of hypothesis testing 9 = 9* is 
called the null (or maintained) hypothesis and is generally denoted by H 0 . The null 
hypothesis is tested against an alternative hypothesis, denoted by Hi, which, for example, 
may state that 9 0*. {Note: In some textbooks, H 0 and II\ are designated by II\ and H 2 , 
respectively.) 

The null hypothesis and the alternative hypothesis can be simple or composite. A 
hypothesis is called simple if it specifies the value(s) of the parameter(s) of the distribution; 
otherwise it is called a composite hypothesis. Thus, if X ~ N(p, a 2 ) and we state that 

H 0 : p = 15 and a = 2 

it is a simple hypothesis, whereas 

H 0 : p = 15 and o > 2 

is a composite hypothesis because here the value of a is not specified. 

To test the null hypothesis (i.e., to test its validity), we use the sample information to 
obtain what is known as the test statistic. Very often this test statistic turns out to be 
the point estimator of the unknown parameter. Then we try to find out the sampling, or 
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probability, distribution of the test statistic and use the confidence interval or test of 
significance approach to test the null hypothesis. The mechanics are illustrated below. 

To fix the ideas, let us revert to Example 24, which was concerned with the height (X) of 
men in a population. We are told that 


X f ~ N(ji, a 1 ) = N(p, 2.5 2 ) 
X = 67 n= 100 


Let us assume that 


H 0 \ p = p* = 69 
Hpp^ 69 


The question is: Could the sample with X = 67, the test statistic, have come from the pop¬ 
ulation with the mean value of 69? Intuitively, we may not reject the null hypothesis ifX is 
“sufficiently close” to p*\ otherwise we may reject it in favor of the alternative hypothesis. 
But how do we decide that X is “sufficiently close” to p*l We can adopt two approaches, 
(1) confidence interval and (2) test of significance, both leading to identical conclusions in 
any specific application. 

The Confidence Interval Approach 

Since X, ~ N(p, er 2 ), we know that the test statistic X is distributed as 


X ~ N(p, a 2 In) 


Since we know the probability distribution of X, why not establish, say, a 100(1 — a) 
confidence interval for p based on X and see whether this confidence interval includes 
p — /i*? If it does, we may not reject the null hypothesis; if it does not, we may reject the 
null hypothesis. Thus, if a — 0.05, we will have a 95 percent confidence interval and if this 
confidence interval includes p*, we may not reject the null hypothesis—95 out of 100 
intervals thus established are likely to include p*. 

The actual mechanics are as follows: since X ~ N(p, a 2 /n), it follows that 



that is, a standard normal variable. Then from the normal distribution table we know that 
Pr(—1.96 < Z,- < 1.96) = 0.95 


That is 



which, on rearrangement, gives 



This is a 95 percent confidence interval for p. Once this interval has been established, the 
test of the null hypothesis is simple. All that we have to do is to see whether p = p* lies in 
this interval. If it does, we may not reject the null hypothesis; if it does not, we may reject it. 
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FIGURE A. 12 

95 percent confidence 
interval for fi. 



66.51 


67.49 



This interval obviously does not include ft, — 69. Therefore, we can reject the null hypothesis 
that the true ft is 69 with a 95 percent confidence coefficient. Geometrically, the situation is as 
depicted in Figure A. 12. 

In the language of hypothesis testing, the confidence interval that we have established is 
called the acceptance region and the area(s) outside the acceptance region is (are) called 
the critical region(s), or region(s) of rejection of the null hypothesis. The lower and upper 
limits of the acceptance region (which demarcate it from the rejection regions) are called 
the critical values. In this language of hypothesis testing, if the hypothesized value falls 
inside the acceptance region, one may not reject the null hypothesis; otherwise one may 
reject it. 

It is important to note that in deciding to reject or not reject Hq, we are likely to commit 
two types of errors: (1) we may reject Hq when it is, in fact, true; this is called a type I 
error (thus, in the preceding example X = 67 could have come from the population with a 
mean value of 69), or (2) we may not reject Hq when it is, in fact, false; this is called a 
type II error. Therefore, a hypothesis test does not establish the value of true f.. It merely 
provides a means of deciding whether we may act as if /r = /i*. 

Type I and Type II Errors 
Schematically, we have 


State of Nature 


Ho Is True Ho Is False 

Type I error No error 

No error Type II error 


Decision 


Reject 

Do not reject 


Ideally, we would like to minimize both type I and type II errors. But unfortunately, for 
any given sample size, it is not possible to minimize both the errors simultaneously. The 
classical approach to this problem, embodied in the work of Neyman and Pearson, is to 
assume that a type I error is likely to be more serious in practice than a type II error. There¬ 
fore, one should try to keep the probability of committing a type I error at a fairly low level, 
such as 0.01 or 0.05, and then try to minimize the probability of having a type II error as 
much as possible. 
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FIGURE A.13 


In the literature, the probability of a type I error is designated as a and is called the level 
of significance, and the probability of a type II error is designated as fi. The probability of 
not committing a type II error is called the power of the test. Put differently, the power of 
a test is its ability to reject a false null hypothesis. The classical approach to hypothesis test¬ 
ing is to fix a at levels such as 0.01 (or 1 percent) or 0.05 (5 percent) and then try to maxi¬ 
mize the power of the test; that is to minimize fi. 

It is important that the reader understand the concept of the power of a test, which is best 
explained with an example. 8 

Let X ~ N(pt, 100); that is, X is normally distributed with mean ji and variance 100. 
Assume that a = 0.05. Suppose we have a sample of 25 observations, which gives a sam¬ 
ple mean value of X. Suppose further we entertain the hypothesis H 0 : pt = 50. Since X is 
normally distributed, we know that the sample mean is also normally distributed as: 
X ~ N(fi, 100/25). Hence under the stated null hypothesis that pt = 50, the 95 percent 
confidence interval for A is (p ± 1.96(^/100/25) = pt ± 3.92, that is, (46.08 to 53.92). 
Therefore, the critical region consists of all values of X less than 46.08 or greater than 
53.92. That is, we will reject the null hypothesis that the true mean is 50 if a sample mean 
value is found below 46.08 or greater than 53.92. 

But what is the probability that X will lie in the preceding critical region(s) if the 
true /i has a value different from 50? Suppose there are three alternative hypotheses: 
pt = 48, pt = 52, and pt = 56. If any of these alternatives is true, it will be the actual mean 
of the distribution of X. The standard error is unchanged for the three alternatives since a 2 
is still assumed to be 100. 

The shaded areas in Figure A.13 show the probabilities that A will fall in the critical 
region if each of the alternative hypotheses is true. As you can check, these probabilities 

Distribution ofAwhen N_= 25, a = 10, and pt = 48, 50, 52, or 56. Under H : pt = 50, the critical 
region with a = 0.05 is X < 46.1 and X > 53.9. The shaded area indicates the probability that X 
will fall into the critical region. This probability is: 

0.17 if At = 48 0.17 if = 52 

0.05 if fi = 50 0.85 if/z = 56 



44 46 48 50 52 54 56 58 60 62 



8 The following discussion and the figures are based on Helen M. Walker and Joseph Lev, Statistical 
Inference, Holt, Rinehart and Winston, New York, 1953, pp. 161-162. 
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FIGURE A.14 

Power function of test 
of hypothesis n — 50 
when N=25,a= 10, 
and a — 0.05 . 


Probability of rejecting H 



40 42 44 46 48 H 52 

Scale of p 


are 0.17 (for g = 48), 0.05 (for g = 50), 0.17 (for g = 52) and 0.85 (for g = 56). As you 
can see from this figure, whenever the true value of g differs substantially from the 
hypothesis under consideration (which here is g = 50), the probability of rejecting the 
hypothesis is high but when the true value is not very different from the value given under 
the null hypothesis, the probability of rejection is small. Intuitively, this should make sense 
if the null and alternative hypotheses are very closely bunched. 

This can be seen further if you consider Figure A. 14, which is called the power function 
graph; the curve shown there is called the power curve. 

The reader will by now realize that the confidence coefficient (1 - a) discussed earlier 
is simply 1 minus the probability of committing a type I error. Thus a 95 percent confidence 
coefficient means that we are prepared to accept at the most a 5 percent probability of com¬ 
mitting a type I error—we do not want to reject the true hypothesis by more than 5 out of 
100 times. 

The p Value, or Exact Level of Significance 

Instead of preselecting a at arbitrary levels, such as 1, 5, or 10 percent, one can obtain the 
p (probability) value, or exact level of significance of a test statistic. The p value is 
defined as the lowest significance level at which a null hypothesis can be rejected. 

Suppose that in an application involving 20 df we obtain a t value of 3.552. Now the p 
value, or the exact probability, of obtaining a t value of 3.552 or greater can be seen from 
Table D.2 as 0.001 (one-tailed) or 0.002 (two-tailed). We can say that the observed t value 
of 3.552 is statistically significant at the 0.001 or 0.002 level, depending on whether we are 
using a one-tail or two-tail test. 

Several statistical packages now routinely print out the p value of the estimated test 
statistics. Therefore, the reader is advised to give the p value wherever possible. 

Sample Size and Hypothesis Tests 

In survey-type data involving hundreds of observations, the null hypothesis seems to be 
rejected more frequently than in small samples. It is worth quoting Angus Deaton here: 

As the sample size increases, and provided we are using a consistent estimation procedure, our 
estimates will be closer to the truth, and less dispersed around it, so that discrepancies that are 
undetectable with small sample size will lead to rejection in large samples. Large sample sizes 
are like greater resolving power on a telescope; features that are not visible from a distance 
become more and more sharply delineated as the magnification is turned up. 9 


9 Angus Deaton, The Analysis of Household Surveys: A Microeconometric Approach to Development Policy, 
The Johns Hopkins University Press, Baltimore, 2000, p. 1 50. 
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Following Learner and Schwarz, Deaton suggests adjusting the standard critical values of 
the F and / 2 tests as follows: Reject the null hypothesis when the computed F value exceeds 
the logarithm of the sample size, that is, In, and when the computed x 2 statistic for q 
restriction exceeds qln, where l is the natural logarithm and where n is the sample size. 
These critical values are known as Leamer-Schwarz critical values. 

Using Deaton’s example, if n — 100, the null hypothesis would be rejected only if the 
computed F value were greater than 4.6, but if n — 10,000, the null hypothesis would be 
rejected when the computed F value exceeded 9.2. 

The Test of Significance Approach 

Recall that 



In any given application, X and n are known (or can be estimated), but the true ji and o are 
not known. But if a is specified and we assume (under Hf) that // = //*, a specific numer¬ 
ical value, then Z, can be directly computed and we can easily look at the normal distribu¬ 
tion table to find the probability of obtaining the computed Z value. If this probability is 
small, say, less than 5 percent or 1 percent, we can reject the null hypothesis—if the 
hypothesis were true, the chances of obtaining the particular Z value should be very high. 
This is the general idea behind the test of significance approach to hypothesis testing. The 
key idea here is the test statistic (here the Z statistic) and its probability distribution under 
the assumed value // = //*. Appropriately, in the present case, the test is known as the 
Z test, since we use the Z (standardized normal) value. 

Returning to our example, if /z = /z* = 69, the Z statistic becomes 



67-69 


2.5/vT00 


= -2/0.25 = -8 


If we look at the normal distribution table (Table D.l), we see that the probability of 
obtaining such a Z value is extremely small. {Note: The probability of a Z value exceeding 3 
or - 3 is about 0.00 1 . Therefore, the probability of Z exceeding 8 is even smaller.) Therefore, 
we can reject the null hypothesis that /z = 69; given this value, our chance of obtaining X 
of 67 is extremely small. We therefore doubt that our sample came from the population with 
a mean value of 69. Diagrammatically, the situation is depicted in Figure A. 15. 


FIGURE A.15 

The distribution of 
the Z statistic. 



Z 


-1.96 


0 


1.96 
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In the language of test of significance, when we say that a test (statistic) is significant, 
we generally mean that we can reject the null hypothesis. And the test statistic is regarded 
as significant if the probability of our obtaining it is equal to or less than a, the probability 
of committing a type I error. Thus if a — 0.05, we know that the probability of obtaining a 
Z value of -1.96 or 1.96 is 5 percent (or 2.5 percent in each tail of the standardized normal 
distribution). In our illustrative example Z was —8. Hence the probability of obtaining such 
a Z value is much smaller than 2.5 percent, well below our prespecified probability of com¬ 
mitting a type I error. That is why the computed value of Z = —8 is statistically significant; 
that is, we reject the null hypothesis that the true fi* is 69. Of course, we reached the same 
conclusion using the confidence interval approach to hypothesis testing. 

We now summarize the steps involved in testing a statistical hypothesis: 

Step 1. State the null hypothesis Ho and the alternative hypothesis H\ 

(e.g., H 0 : fi = 69 and H x : n + 69). 

Step 2. Select the test statistic (e.g., A). 

Step 3. Determine the probability distribution of the test statistic 
(e.g „X~A(/u,a 2 /«). 

Step 4. Choose the level of significance (i.e., the probability of committing a 
type I error) a. 

Step 5. Using the probability distribution of the test statistic, establish a 100(1 — a)% 
confidence interval. If the value of the parameter under the null hypothesis (e.g., 
fu. = n* — 69) lies in this confidence region, the region of acceptance, do not reject 
the null hypothesis. But if it falls outside this interval (i.e., it falls into the region of 
rejection), you may reject the null hypothesis. Keep in mind that in not rejecting or 
rejecting a null hypothesis you are taking a chance of being wrong a percent of 
the time. 


References 
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Freund, John E., and Ronald E. Walpole, Mathematical Statistics, 3d ed., Prentice Hall, 
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Mood, Alexander M., Franklin A. Graybill, and Duane C. Boes, Introduction to the Theory 
of Statistics, 3d ed., McGraw-Hill, New York, 1974. This is a comprehensive introduc¬ 
tion to the theory of statistics but is somewhat more difficult than the preceding two text¬ 
books. 

Newbold, Paul, Statistics for Business and Economics, Prentice Hall, Englewood Cliffs, NJ, 
1984. A comprehensive nonmathematical introduction to statistics with lots of worked- 
out problems. 



Appendix 


Rudiments of 
Matrix Algebra 


This appendix offers the essentials of matrix algebra required to understand Appendix C 
and some of the material in Chapter 18. The discussion is nonrigorous, and no proofs are 
given. For proofs and further details, the reader may consult the references. 


B.l Definitions 


Matrix 


A matrix is a rectangular array of numbers or elements arranged in rows and columns. More 
precisely, a matrix of order, or dimension, Mby N (written as M x N) is a set of M x N 
elements arranged in M rows and N columns. Thus, letting boldface letters denote matrices, 
an (M x N ) matrix A may be expressed as 


= K-] = 


fill 012 013 


_0M1 a Ml 0M3 ••• 0MV_ 

where ay is the element appearing in the z'th row and the yth column of A and where [ay ] is 
a shorthand expression for the matrix A whose typical element is ay. The order, or dimen¬ 
sion, of a matrix—that is, the number of rows and columns—is often written underneath 
the matrix for easy reference. 


1 5 



7 

4 

11 . 


Scalar 


A scalar is a single (real) number. Alternatively, a scalar is a 1 x 1 matrix. 


Column Vector 


A matrix consisting of M rows and only one column is called a column vector. Letting the 
boldface lowercase letters denote vectors, an example of a column vector is 
"3" 


.9. 


838 
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Row Vector 

A matrix consisting of only one row and N columns is called a row vector. 

x =[1 2 5 -4] y =[0 5 -9 6 10] 

1x4 1x5 


Transposition 

The transpose of an M x N matrix A, denoted by A' (read as A prime or A transpose) is an 
N x M matrix obtained by interchanging the rows and columns of A; that is, the ith row of 
A becomes the ith column of A'. For example, 



Since a vector is a special type of matrix, the transpose of a row vector is a column vector 
and the transpose of a column vector is a row vector. Thus 


and x' = [4 5 6] 


We shall follow the convention of indicating the row vectors by primes. 


Submatrix 

Given any M x N matrix A, if all but r rows and s columns of A are deleted, the resulting 
matrix of order r x s is called a submatrix of A. Thus, if 


3 5 7 
A = 8 2 1 

3x3 |_3 2 1 _ 

and we delete the third row and the third column of A, we obtain 



which is a submatrix of A whose order is 2 x 2. 


B.2 Types of Matrices 


Square Matrix 

A matrix that has the same number of rows as columns is called a square matrix. 


Diagonal Matrix 


A square matrix with at least one nonzero element on the main diagonal (running from the 
upper-left-hand comer to the lower-right-hand comer) and zeros elsewhere is called a 

diagonal matrix. 


[o!] 


-2 0 
0 5 
0 0 


0 

0 

1 
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Scalar Matrix 


A diagonal matrix whose diagonal elements are all equal is called a scalar matrix. An ex¬ 
ample is the variance-covariance matrix of the population disturbance of the classical lin¬ 
ear regression model given in Equation (C.2.3), namely, 


var-cov(u) = 


0 

0 

0 

_ 0 


0 

a 2 

0 

0 

0 


0 0 0 - 

0 0 0 

a 1 0 0 

0 a 2 0 

0 Oct 2 . 


Identity, or Unit, Matrix 

A diagonal matrix whose diagonal elements are all 1 is called an identity, or unit, matrix 
and is denoted by I. It is a special kind of scalar matrix. 


I = 

3x3 


1 0 0 
0 1 0 
0 0 1. 


10 0 0 
. = 0 10 0 
4x4 0 0 1 0 

_0 0 0 1 . 


Symmetric Matrix 

A square matrix whose elements above the main diagonal are mirror images of the ele¬ 
ments below the main diagonal is called a symmetric matrix. Alternatively, a symmetric 
matrix is such that its transpose is equal to itself; that is, A = A'. That is, the element «, 7 of 
A is equal to the element Uj, of A'. An example is the variance-covariance matrix given in 
Equation (C.2.2). Another example is the correlation matrix given in (C.5.1). 


Null Matrix 

A matrix whose elements are all zero is called a null matrix and is denoted by 0. 


Null Vector 

A row or column vector whose elements are all zero is called a null vector and is also 
denoted by 0. 


Equal Matrices 

Two matrices A and B are said to be equal if they are of the same order and their corre¬ 
sponding elements are equal; that is, a t j = btj for all i and j. For example, the matrices 


3 4 5 

0-12 

and B = 

3 4 5 

0-12 

5 1 3 

3x3 

5 1 3 


are equal; that is A = B. 


B.3 Matrix Operations 

Matrix Addition 

Let A = [a*/] and B = [b,j]. If A and B are of the same order, we define matrix addition as 
A + B = C 
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where C is of the same order as A and B and is obtained as cy = «y + by for all i and j; 
that is, C is obtained by adding the corresponding elements of A and B. If such addition can 
be effected, A and B are said to be conformable for addition. For example, if 


A = 


'2 3 4 
6 7 8 


and 



0 -1 3] 
0 1 5 J 


and C = A + B, then 


Matrix Subtraction 


3 

9 


:] 


Matrix subtraction follows the same principle as matrix addition except that C = A — B; 
that is, we subtract the elements of B from the corresponding elements of A to obtain C, 
provided A and B are of the same order. 


Scalar Multiplication 

To multiply a matrix A by a scalar A (a real number), we multiply each element of the ma¬ 
trix by A: 


For example, if A = 2 and 


-[1 ?] 


AA = 


-6 10 ] 

16 14 


Matrix Multiplication 

Let Abe M x N and B be N x P. Then the product AB (in that order) is defined to be a 
new matrix C of order M x P such that 


c b = J2 a ‘ kbk J 


i = 1,2,... 
7 = 1,2,... 


M 


That is, the element in the zth row and the /th column of C is obtained by multiplying the ele¬ 
ments of the zth row of A by the corresponding elements of they'th column of B and summing 
over all terms; this is known as the row by column rule of multiplication. Thus, to obtain c\\, 
the element in the first row and the first column of C, we multiply the elements in the first row 
of A by the corresponding elements in the first column of B and sum over all terms. Similarly, 
to obtain cu, we multiply the elements in the first row of A by the corresponding elements in 
the second column of B and sum over all terms, and so on. 

Note that for multiplication to exist, matrices A and B must be conformable with respect 
to multiplication; that is, the number of columns in A must be equal to the number of rows 
in B. If, for example, 

? and B = 

1J 3x2 

x 3) + (7 x 6) (3x1) 
x 3) + (1 x 6) (5x1) 


3 4 
5 6 


I" (3 x 2) + (4 
L(5x2) + (6 


F(4x5)H 

F(6x5)H 


'60 37' 
34 37 
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But if 


the product AB is not defined since A and B are not conformable with respect to multiplication. 

Properties of Matrix Multiplication 

1. Matrix multiplication is not necessarily commutative; that is, in general, AB / BA. 
Therefore, the order in which the matrices are multiplied is very important. AB means 
that A is postmultiplied by B or B is premultiplied by A. 

2. Even if AB and BA exist, the resulting matrices may not be of the same order. Thus, if 
AisMxiV and BisiVxM, AB is M x M whereas BA is N x N, hence of different 
order. 

3. Even if A and B are both square matrices, so that AB and BA are both defined, the 
resulting matrices will not be necessarily equal. For example, if 



and AB / BA. An example of AB = BA is when both A and B are identity matrices. 

4. A row vector postmultiplied by a column vector is a scalar. Thus, consider the ordinary 
least-squares residuals u\,u 2 ,... ,u n . Letting u be a column vector and u' be a row vec¬ 
tor, we have 


fi'u = [U\ U 2 M 3 


Ml 

u 2 

u n ] 



= ^m? a scalar [see Eq. (C.3.5)] 


5. A column vector postmultiplied by a row vector is a matrix. As an example, consider the 
population disturbances of the classical linear regression model, namely, u \, u 2 ,.. ., u n . 
Letting u be a column vector and u' a row vector, we obtain 


m 2 

m 3 


[Ml 


m 2 m 3 ••• u n ] 


u\ U\U 2 Miw 3 Uiu„ 

U 2 U\ u\ m 2 m 3 u 2 u n 


uu' = 
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which is a matrix of order n x n. Note that the preceding matrix is symmetrical. 

6. A matrix postmultiplied by a column vector is a column vector. 

7. A row vector postmultiplied by a matrix is a row vector. 

8. Matrix multiplication is associative; that is, (AB)C = A(BC), where A is M x N, B is 
N x P, and C is P x K. 

9. Matrix multiplication is distributive with respect to addition; that is, A(B + C) = AB + 
AC and (B + C)A = BA + CA. 

Matrix Transposition 

We have already defined the process of matrix transposition as interchanging the rows and 

the columns of a matrix (or a vector). We now state some of the properties of transposition. 

1. The transpose of a transposed matrix is the original matrix itself. Thus, (A')' = A. 

2. If A and B are conformable for addition, then C = A + B and C' = 
(A + B)' = A' + B'. That is, the transpose of the sum of two matrices is the sum of their 
transposes. 

3. If AB is defined, then (AB)' = B'A'. That is, the transpose of the product of two matri¬ 
ces is the product of their transposes in the reverse order. This can be generalized: 
(ABCD)' = D'C'B'A'. 

4. The transpose of an identity matrix I is the identity matrix itself; that is I' = I. 

5. The transpose of a scalar is the scalar itself. Thus, if a is a scalar, A/ = a. 

6. The transpose of (AA)' is AA' where A. is a scalar. [Note: (AA)' = A'A' = A'a = a A'.] 

7. If A is a square matrix such that A — A', then A is a symmetric matrix. (See the defini¬ 
tion of symmetric matrix given in Section B.2.) 

Matrix Inversion 

An inverse of a square matrix A, denoted by A -1 (read A inverse), if it exists, is a unique 

square matrix such that 

AA -1 = A -1 A = I 

where I is an identity matrix whose order is the same as that of A. For example 



We shall see how A 1 is computed after we study the topic of determinants. In the mean¬ 
time, note these properties of the inverse. 

1. (AB) -1 = B -1 A 1 ; that is, the inverse of the product of two matrices is the product of 
their inverses in the reverse order. 

2. (A -1 )' = (A') -1 ; that is, the transpose of A inverse is the inverse of A transpose. 

B.4 Determinants 


To every square matrix, A, there corresponds a number (scalar) known as the determinant of the 
matrix, which is denoted by det A or by the symbol | A |, where | | means “the determinant of.” 
Note that a matrix per se has no numerical value, but the determinant of a matrix is a number. 


"l 3 

2 5 

-7~ 

0 

1 A | — 

1 3 

2 5 

-7 

0 

.3 8 

6_ 


3 8 

6 


A = 
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The | A | in this example is called a determinant of order 3 because it is associated with a 
matrix of order 3x3. 

Evaluation of a Determinant 

The process of finding the value of a determinant is known as the evaluation, expansion, or 
reduction of the determinant. This is done by manipulating the entries of the matrix in a 
well-defined manner. 

Evaluation of a 2x2 Determinant 
If 



its determinant is evaluated as follows: 


I A] = "X 12 =«11022-012*21 
| 021 022 | 


which is obtained by cross-multiplying the elements on the main diagonal and subtracting 
from it the cross-multiplication of the elements on the other diagonal of matrix A, as indi¬ 
cated by the arrows. 

Evaluation of a 3 x 3 Determinant 
If 


012 013 


021 022 023 

_ 031 032 033 _ 


then 


I A | = 011022033 - 011023^32 + 012023 «31 ~ 012021«33 + 013^21032 ~ 013«22«31 

A careful examination of the evaluation of a 3 x 3 determinant shows: 

1. Each term in the expansion of the determinant contains one and only one element from 
each row and each column. 

2. The number of elements in each term is the same as the number of rows (or columns) in 
the matrix. Thus, a 2 x 2 determinant has two elements in each term of its expansion, a 
3x3 determinant has three elements in each term of its expansion, and so on. 

3. The terms in the expansion alternate in sign from + to —. 

4. A 2 x 2 determinant has two terms in its expansion, and a 3 x 3 determinant has six 
terms in its expansion. The general rule is: The determinant of order N x N has 
N\ = N(N — 1)(jV — 2) • • • 3 • 2 • 1 terms in its expansion, where N\ is read “N factor¬ 
ial.” Following this rule, a determinant of order 5x5 will have 5 • 4 • 3 • 2 ■ 1 = 120 
terms in its expansion. 1 

Properties of Determinants 

1. A matrix whose determinantal value is zero is called a singular matrix, whereas a 
matrix with a nonzero determinant is called a nonsingular matrix. The inverse of 
a matrix as defined before does not exist for a singular matrix. 

’To evaluate the determinant of an N x N matrix, A, see the references 
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2. If all the elements of any row of A are zero, its determinant is zero. Thus, 

0 0 0 

IA | m 3 4 5 =0 

6 7 8 

3. | A' | = | A |; that is, the determinants of A and A transpose are the same. 

4. Interchanging any two rows or any two columns of a matrix A changes the sign of | A |. 


[-?:] 


where B is obtained by interchanging the rows of A, then 


IAI = 24 — (—9) and I Bl = —9 — (24) 
= 33 =-33 


5. If every element of a row or a column of A is multiplied by a scalar X, then | A | is 
multiplied by X. 


EXAMPLE 2 


A = 5 and A=[* J] 
and we multiply the first row of A by 5 to obtain 



it can be seen that IAI = 36 and I BI = 180, which is 5 IAI. 


6. If two rows or columns of a matrix are identical, its determinant is zero. 

7. If one row or a column of a matrix is a multiple of another row or column of that matrix, 
its determinant is zero. Thus, if 



where the first row of A is twice its second row, | A | =0. More generally, if any row 
(column) of a matrix is a linear combination of other rows (columns), its determinant is 
zero. 

8. | AB | = | A11 B |; that is, the determinant of the product of two matrices is the product 
of their (individual) determinants. 

Rank of a Matrix 

The rank of a matrix is the order of the largest square suhmatrix whose determinant is not 

zero. 
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EXAMPLE 3 


A -Gn] 

It can be seen that IAI = 0. In other words, A is a singular matrix. Hence although its order 
is 3 x 3, its rank is less than 3. Actually, it is 2, because we can find a 2 x 2 submatrix 
whose determinant is not zero. For example, if we delete the first row and the first column 
of A, we obtain 


whose determinant is -6, which is nonzero. Hence the rank of A is 2. As noted previously, 
the inverse of a singular matrix does not exist. Therefore, for an N x N matrix A, its rank 
must be N for its inverse to exist; if it is less than N, A is singular. 


Minor 

If the z'th row and y'th column of an N x N matrix A are deleted, the determinant of the re¬ 
sulting submatrix is called the minor of the element a,-/ (the element at the intersection of 
the z'th row and they'th column) and is denoted by | M, ; |. 


on o 12 a 13 
= 021 cz 22 0 2 3 

Lo 3 i o 32 o 33 J 


The minor of on is 


Similarly, the minor of o 2 1 is 

I Mail = 


= o 22 o 33 - 0 23 0 32 


012 Ol 3 

032 0 33 J 


= Ol 2 0 33 - 0l 3 0 32 

The minors of other elements of A can be found similarly. 


Cofactor 

The cofactor of the element ay of an iV x AI matrix A, denoted by Cy, is defined as 

In other words, a cofactor is a signed minor, the sign being positive if i + j is even and 
being negative if i + j is odd. Thus, the cofactor of the element flu of the 3 x 3 matrix A 
given previously is O22O33 - O23O32, whereas the cofactor of the element <221 is 
—(czi2<233 — a. 13 CL 32 ) since the sum of the subscripts 2 and 1 is 3, which is an odd number. 

Cofactor Matrix 

Replacing the elements ay of a matrix A by their cofactors, we obtain a matrix known as 
the cofactor matrix of A, denoted by (cof A). 

Adjoint Matrix 

The adjoint matrix, written as (adj A), is the transpose of the cofactor matrix; that is, 
(adj A) = (cof A)'. 
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B.5 Finding the Inverse of a Square Matrix 

If A is square and nonsingular (that is, | A| ± 0), its inverse A -1 can be found as follows: 

A -1 = —'— (adj A) 

|Ap 

The steps involved in the computation are as follows: 

1. Find the determinant of A. If it is nonzero, proceed to step 2. 

2. Replace each element a,/ of A by its cofactor to obtain the cofactor matrix. 

3. Transpose the cofactor matrix to obtain the adjoint matrix. 

4. Divide each element of the adjoint matrix by | A |. 


Find the inverse of the matrix 


L2 1 


Step 1. We first find the determinant of the matrix. Applying the rules of expanding 
a 3 x 3 determinant given previously, we obtain IAI = -24. 

Step 2. We now obtain the cofactor matrix, say, C: 


I? i\ -II 


-I? 

| 2 7 


II 

HI 


-15 f| 


-3 -3 
L— 13 11 


a 


Step 3. Transposing the preceding cofactor matrix, we obtain the following adjoint 
matrix: 


(adj A) = _7 


[" 3 -S] 

of (adj 

i[s 


Step 4. We now divide the elements of (adj A) by the determinantal value of -24 to 
obtain 

j -si 


It can be readily verified that 


n o oi 
^ = 0 1 0 
Lo 0 ij 


which is an identity matrix. The reader should verify that for the illustrative example given 
in Appendix C (see Section C.10) the inverse of the X'X matrix is as shown in Eq. (C.10.5). 
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B.6 Matrix Differentiation 


To follow the material in Appendix CA, Section CA.2, we need some rules regarding 
matrix differentiation. 


If a' = [oi 02 ■ • • o n ] is a row vector of numbers, and 

*1 
*2 

is a column vector of the variables x 1( x 2 / • • •, x n , then 
3(a'x) 


9x 


Consider the matrix x'Ax such that 


Then 


on 012 
021 022 


Om 


x'Ax = [*1 *2 ■ ■ ■ x, 

L On 1 On2 Onn J 

®<^!>=2Ax 

which is a column vector of n elements, or 

^=2x'A 

which is a row vector of n elements. 
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Appendix 


The Matrix Approach 
to Linear Regression 



This appendix presents the classical linear regression model involving k variables (7 and 
X 2 ,X 3 ,..., X k ) in matrix algebra notation. Conceptually, the &-variable model is a logical 
extension of the two- and three-variable models considered thus far in this text. Therefore, 
this appendix presents very few new concepts save for the matrix notation. 1 

A great advantage of matrix algebra over scalar algebra (elementary algebra dealing 
with scalars or real numbers) is that it provides a compact method of handling regression 
models involving any number of variables; once the ^-variable model is formulated 
and solved in matrix notation, the solution applies to one, two, three, or any number of 
variables. 


C. 1 The A 1 -Variable Linear Regression Model 


If we generalize the two- and three-variable linear regression models, the ^-variable 
population regression function (PRF) model involving the dependent variable 7 and k — 1 
explanatory variables X 2 , X 3 ,.. ., X k may be written as 

PRF: Y t = fa + faX 2i + faX 3i + • • • + faX H + u t i = 1,2, 3,..., n 

(C.1.1) 

where fa = the intercept, fa to fa = partial slope coefficients, u — stochastic distur¬ 
bance term, and i = ith observation, n being the size of the population. The PRF (C.1.1) is 
to be interpreted in the usual manner: It gives the mean or expected value of 7 condi¬ 
tional upon the fixed (in repeated sampling) values of X 2 , X 3 ,..., X k , that is, 
E(Y \X 2i ,X 3i ,...,X ki ). 


'Readers not familiar with matrix algebra should review Appendix B before proceeding any further. 
Appendix B provides the essentials of matrix algebra needed to follow this appendix. 
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Equation (C.1.1) is a shorthand expression for the following set of n simultaneous 
equations: 


Y\ = Pi + fcXix + P3X3X H-+ PkXn + u\ 

Yi = fix + P2X22 + foX 32 + • • ■ + foX n + U2 (C.1.2) 


Y„ = Px + P2X2n + P3X3„ H-+ PkXkn + u n 

Let us write the system of equations (C. 1.2) in an alternative but more illuminating way 
follows: 2 


Y\ ~ 


'1 X21 

a 3 i • 

Ah' 

~Px~ 


"221" 

y 2 

- 

1 a 22 

a 32 • 

Ah 

h 

+ 

u 2 

Y„_ 


_i x 2 „ 


•• x kr _ 

Jk. 


U t j 

y 

= 


X 


P 

+ 

u 


72 X 1 nxk kx 1 72Xl 

where y = 72 x 1 column vector of observations on the dependent variable Y 
X = 72 x k matrix giving n observations on k — 1 variables X2 to Xk, 

the first column of l’s representing the intercept term (this matrix is also 
known as the data matrix) 

P = k x 1 column vector of the unknown parameters Pi, fh, ■ ■ ■, fa 
u = 72 x 1 column vector of 72 disturbances u, 

Using the rules of matrix multiplication and addition, the reader should verify that systems 
(C.1.2) and (C.1.3) are equivalent. 

System (C.1.3) is known as the matrix representation of the general (k-variable) linear 
regression model. It can be written more compactly as 


+ <“•«> 


Where there is no confusion about the dimensions or orders of the matrix X and the vectors 
y, P, and u, Eq. (C.1.4) may be written simply as 

y = Xp + u (C.1.5) 

As an illustration of the matrix representation, consider the two-variable consumption- 
income model considered in Chapter 3, namely, Y t — P\ + + 22, , where Y is con¬ 

sumption expenditure and A is income. Using the data given in Table 3.2, we may write the 


following the notation introduced in Appendix B, we shall represent vectors by lowercase 
boldfaced letters and matrices by uppercase boldfaced letters. 
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matrix formulation 


70" 
65 
90 
95 
110 
115 
120 
140 
155 
_ 1 50 _ 

y 

10 x 1 


1 80' 
1 100 
1 120 
1 140 

1 160 
1 180 
1 200 
1 220 
1 240 
1 260 
X 

10x2 


u 2 

u 3 

«4 

«5 

U 6 

«7 

W8 

«9 

_u m _ 
10 X 1 


(C.1.6) 



P + 

2 x 1 


As in the two- and three-variable cases, our objective is to estimate the parameters of the 
multiple regression (C. 1.1) and to draw inferences about them from the data at hand. In ma¬ 
trix notation this amounts to estimating P and drawing inferences about this p. For the pur¬ 
pose of estimation, we may use the method of ordinary least squares (OLS) or the method 
of maximum likelihood (ML). But as noted before, these two methods yield identical esti¬ 
mates of the regression coefficients. 3 Therefore, we shall confine our attention to the 
method of OLS. 


C.2 Assumptions of the Classical Linear Regression Model 
in Matrix Notation 


The assumptions underlying the classical linear regression model are given in Table C.l; 
they are presented both in scalar notation and in matrix notation. Assumption 1 given in 
Eq. (C.2.1) means that the expected value of the disturbance vector u, that is, of each of its 
elements, is zero. More explicitly, £(u) = 0 means 


"m" 


~E(u i)" 


"0" 

«2 

= 

E(u 2 ) 

= 

0 

U n 


_E{u n )_ 


_ 0 _ 


(C.2.1) 


Assumption 2 (Eq. [C.2.2]) is a compact way of expressing the two assumptions given 
in Eqs. (3.2.5) and (3.2.2) by the scalar notation. To see this, we can write 


E( uu') = E 


3 The proof that this is so in the /c-variable case can be found in the footnote reference given in 
Chapter 4. 
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TABLE C.1 
Assumptions of the 
Classical Linear 
Regression Model 


Scalar Notation 


Matrix Notation 


1. £(u;) = 0, for each / (3.2.1) 


2. E(UiUj) = 0 M/ (3.2.5) 

= a 2 t--j (3.2.2) 

3. X 2 , X 3; ..., X k are 
nonstochastic or fixed 

4. There is no exact linear (7.1.9) 
relationship among the 

X variables, that is, no 
multicollinearity 

5. For hypothesis testing, (4.2.4) 

ui ~ N(0, a 2 ) 


1. £(u)= 0 

where u and 0 are n x 1 column vectors, 

0 being a null vector 

2. £(uu') = <r 2 l 

where I is an n x n identity matrix 

3. The nxk matrix X is nonstochastic, that is, 
it consists of a set of fixed numbers 

4. The rank of X is p(X) = k, where k is the 
number of columns in X and k is less than 
the number of observations, n 

5. The u vector has a multivariate normal 
distribution, i.e., u ~ N( 0, <x 2 l) 


where u' is the transpose of the column vector u, or a row vector. Performing the multipli¬ 
cation, we obtain 


UlU„ 



Applying the expectations operator E to each element of the preceding matrix, we obtain 


E(m') = 


- E(u^) 
E(u 2 ui) 


E{u x u 2 ) 

E(ul) 


E(U„U\) E(u n U 2 ) 


E{u\u n J 
E(u 2 u n ) 


HO J 


(C.2.2) 


Because of the assumptions of homoscedasticity and no serial correlation, matrix (C.2.2) 
reduces to 


.EXmi') = 


0 


0 


0 0 - 0 

a 2 0 ••• 0 


0 0 


(C.2.3) 


= er 2 I 

where I is an n xn identity matrix. 

Matrix (C.2.2) (and its representation given in Eq. [C.2.3]) is called the variance- 
covariance matrix of the disturbances w,; the elements on the main diagonal of this ma¬ 
trix (running from the upper left comer to the lower right comer) give the variances, and the 
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elements off the main diagonal give the covariances. 4 Note that the variance-covariance 
matrix is symmetric: The elements above and below the main diagonal are reflections of 
one another. 

Assumption 3 in Table C.l states that the n x k matrix X is nonstochastic; that is, it con¬ 
sists of fixed numbers. As noted previously, our regression analysis is conditional regres¬ 
sion analysis, conditional upon the fixed values of the X variables. 

Assumption 4 states that the X matrix has full column rank equal to k, the number of 
columns in the matrix. This means that the columns of the X matrix are linearly indepen¬ 
dent; that is, there is no exact linear relationship among the X variables. In other words 
there is no multicollinearity. In scalar notation this is equivalent to saying that there exists 
no set of numbers Ai, X 2 ,... ,X k not all zero such that (cf. Eq. [7.1.8]) 

AiXu + A 2 X 2i - + • • • + X k X ki = 0 (C.2.4) 

where Xu — 1 for all i (to allow for the column of 1 ’s in the X matrix). In matrix notation, 
Eq. (C.2.4) can be represented as 

A'x = 0 (C.2.5) 

where X' is a 1 x k row vector and x is a k x 1 column vector. 

If an exact linear relationship such as Eq. (C.2.4) exists, the variables are said to be collinear. 
If, on the other hand, Eq. (C.2.4) holds true only if A| = A 2 = A3 = • • ■ = 0, then the A vari¬ 
ables are said to be linearly independent. An intuitive reason for the no multicollinearity 
assumption was given in Chapter 7, and we explored this assumption further in Chapter 10. 


C.3 OLS Estimation 


To obtain the OLS estimate of p, let us first write the ^-variable sample regression function 
(SRF): 


Yi=h + hXn + ^3X3, + • • • + hXu + Ui 
which can be written more compactly in matrix notation as 


y = Xp + u 

and in matrix form as 


T,' 


‘1 x 21 

A31 ■ 

•• Xu 

"A" 


Mi 

r 2 

m 

1 a 22 

a 32 ■ 

•• x k2 

A 

+ 

u 2 

Y„_ 


.1 x 2 „ 

x 2n • 

•• x kn _ 

-A- 


_u n _ 


y = x p + u 

« x 1 nxk kx 1 nx 1 


(C.3.1) 

(C.3.2) 


(C.3.3) 


where P is a ^-element column vector of the OLS estimators of the regression coefficients 
and where u is an n x 1 column vector of n residuals. 


4 By definition, the variance of u, = £[u,— £(u,)] 2 and the covariance between Ui and Uj= £[t/,- — £(u f )] 
[Uj— E(uj)]. But because of the assumption £(u,) = 0 for each i, we have the variance-covariance 
matrix (C.2.3). 
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As in the two- and three-variable models, in the ^-variable case the OLS estimators are 
obtained by minimizing 

E“? = ~ Pi ~ - kXuf (C.3.4) 

where E u] is the residual sum of squares (RSS). In matrix notation, this amounts to min¬ 
imizing u'u since 


u'u = [fii 


u 2 



=E 4 ? < C3 - 5 > 


Now from Eq. (C.3.2) we obtain 

u = y-Xp (C.3.6) 

Therefore, 

u'u = (y-Xp)'(y-Xp) 

= y , y-2p'X'y+P'X'Xp 

where use is made of the properties of the transpose of a matrix, namely, (X(f)' = P'X'; and 
since P'X'y is a scalar (a real number), it is equal to its transpose y'Xp. 

Equation (C.3.7) is the matrix representation of (C.3.4). In scalar notation, the method 
of OLS consists in so estimating Pi, p 2 ,Pk that E u] is as small as possible. This is 
done by differentiating Eq. (C.3.4) partially with respect to Pi, p 2 , ..., Pk and setting the 
resulting expressions to zero. This process yields k simultaneous equations in k unknowns, 
the normal equations of the least-squares theory. As shown in Appendix CA, Section CA. 1, 
these equations are as follows: 


np i+ PiJ2 x *+& E x 3< + ■ ■ •+ PkJ2 x * = E Yi 
Pi E x * +&E x li +&E x * x * +•*•+& E x » x * = E x » Yi 
^ e *3 ,+Pi E +ftE4 + -+AE x * x * = E Y - 


Pi E Xk ‘+&E + a E x * x * + • • •+h E xl ki 


In matrix form, Eq. (C.3.8) can be represented as 


Em 

(C.3.8) 5 


n 

Ex 2i 

E*3, 

••• E*fa- 

Pi 


1 

1 

1 

_ 7i" 

TXn 

E*l 

T.x 2i x 3i 

■■■ EXiiXki 

Pi 


*21 

*22 • 

•• *2« 

Yi 

E*3< 

EW 

E4 

••• E^4 

Ps 

- 

*31 

*32 • 

•• *3n 

>3 

E*k 

EMa 

E44 

••• H x li _ 

Jk _ 


_*« 

*42 • 

•• *4n_ 

_7„_ 


(X'X) p X' y 

(C.3.9) 


s These equations can be remembered easily. Start with the equation Y-, = /6i + fi 2 X2; + 
y§3X3/ H-1- PkXki- Summing this equation over the n values gives the first equation in (C.3.8); mul¬ 

tiplying it by X2 on both sides and summing over n gives the second equation; multiplying it by X3 
on both sides and summing over n gives the third equation; and so on. In passing, note that the first 
equation in (C.3.8) gives at once /3i = Y — P2X2 -fttX* (cf. [7.4.6]). 
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or, more compactly, as 

(X'X)P = X'y (C.3.10) 

Note these features of the (X'X) matrix: (1) It gives the raw sums of squares and cross 
products of the X variables, one of which is the intercept term taking the value of 1 for each 
observation. The elements on the main diagonal give the raw sums of squares, and those off 
the main diagonal give the raw sums of cross products (by raw we mean in original units of 
measurement). (2) It is symmetrical since the cross product between^, and X 3l is the same 
as that between X 3l and X 2 ,. (3) It is of order (k x k), that is, k rows and k columns. 

InEq. (C.3.10) the known quantities are (X'X) and(X'y) (the cross product between the 
X variables andy) and the unknown is p. Now using matrix algebra, if the inverse of (X'X) 
exists, say, (X'X) -1 , then premultiplying both sides of Eq. (C.3.10) by this inverse, we 
obtain 

(X'X) -1 (X'X)P = (X'X) -1 X'y 

But since (X'X) -1 (X'X) = I, an identity matrix of order k x k, we get 

IP = (X'X) -1 X'y 
or 


p = (X'X) -1 X' y 
k x 1 kxk (k x n) (n x 1) 


(C.3.11) 


Equation (C.3.11) is a fundamental result of the OLS theory in matrix notation. It 

shows how the P vector can be estimated from the given data. Although Eq. (C.3.11) was 
obtained from Eq. (C.3.9), it can be obtained directly from Eq. (C.3.7) by differentiating u'u 
with respect to p. The proof is given in Appendix CA, Section CA.2. 


An Illustration 

As an illustration of the matrix methods developed so far, let us work a consumption- 
income example using the data in Eq. (C.1.6). For the two-variable case we have 


(X'X) 

and 



1 *1 




11 

1 x 2 


n 

EXi 

X.\ 

1 X 3 


T,Xt 

Ex? 


_1 Xn. 





X'y 


' 1 1 1 

X, x 2 x 3 


y, 



EX, 


Yi 
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Using the data given in Eq. (C. 1.6), we obtain 
XX = 
and 


I" 10 1700] 

L1700 322000J 


^=[205 


110 ] 
205500J 


Using the rules of matrix inversion given in Appendix B, Section B.3, we can see that the 
inverse of the preceding (X'X) matrix is 


Therefore, 


[ 0.97576 -0.005152 ] 

[-0.005152 0.0000303 J 



0.97576 -0.005152 ] [ 1110] 

-0.005152 0.0000303 J [205500 J 


[24.4545] 
[ 0.5079 J 


Using the computer, we obtained P\ = 24.4545 and p 2 = 0.5091. The difference be¬ 
tween the two estimates is due to the rounding errors. In passing, note that in working on a 
desk calculator it is essential to obtain results to several significant digits to minimize the 
rounding errors. 


Variance-Covariance Matrix of (3 

Matrix methods enable us to develop formulas not only for the variance of fy, any given 
element of P, but also for the covariance between any two elements of P, say, Pi and Pj. We 
need these variances and covariances for the purpose of statistical inference. 

By definition, the variance-covariance matrix of P is (compare Eq. [C.2.2]) 

var-cov(P) = £{[P - £(P)][P - £(P)]'} 
which can be written explicitly as 


var-cov(P) = 


var(/§i) 
CO V02,Pl) 


coviPufo) 

var (Pi) 


_cov(Pk,P0 cov(p k ,p 2 ) 


cov (P \, p k ) 
cov(p 2 ,Pk) 
var (A0 _ 

(C.3.12) 


It is shown in Appendix CA, Section CA.3, that the preceding variance-covariance matrix 
can be obtained from the following formula: 


var-cov(P) = ct 2 (X'X) 1 


(C.3.13) 


where er 2 is the homoscedastic variance of w, and (X'X) 1 is the inverse matrix appearing 
in Eq. (C.3.11), which gives the OLS estimator p. 
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In the two- and three-variable linear regression models an unbiased estimator of a 2 was 
given by <r 2 = ^ u 2 /(n — 2) and <r 2 = ^ « 2 /(« — 3), respectively. In the &-variable case, 
the corresponding formula is 



(C.3.14) 


where there are now n — k df. (Why?) 

Although in principle u'u can be computed from the estimated residuals, in practice it 
can be obtained directly as follows. Recalling that ^ w 2 (= RSS) = TSS — ESS, in the 
two-variable case we may write 

X fi ? = E*?-& 2 E*? < 3 - 3 - 6 > 

and in the three-variable case 

X = X ~ & X yiX2i ~ & X yiX3i (7.4.19) 

By extending this principle, it can be seen that for the ^-variable model 

X*? = X* 2 - - PkJ2 y i Xki (C.3.15) 


In matrix notation, 


tss: J2yf = y'y- ny2 (c.3.16) 

ESS: h J2y^2i + • • • + &X^ = P' x 'y - nY 2 (C.3.1 7) 


where the term nY 2 is known as the correction for mean. 6 Therefore, 


u'u = y'y - P'X'y 


(C.3.18) 


Once u'u is obtained, <f 2 can be easily computed from Eq. (C.3.14), which, in turn, will 
enable us to estimate the variance-covariance matrix (C.3.13). 

For our illustrative example, 

u'u = 132100 - [24.4545 0.5091] [ 20 “Jq] 

= 337.373 

Hence, a 2 = (337.273/8) = 42.1591, which is approximately the value obtained previ¬ 
ously in Chapter 3. 


6 Note: T.yf = Y.( Y i “ Y ) 2 = Y f ~ nY2 = y'y nY 2 . Therefore, without the correction term, y'y 
will give simply the raw sum of squares, not the sum of squared deviations. 
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Properties of OLS Vector p 

In the two- and three-variable cases we know that the OLS estimators are linear and unbi¬ 
ased, and in the class of all linear unbiased estimators they have minimum variance (the 
Gauss-Markov property). In short, the OLS estimators are best linear unbiased estimators 
(BLUE). This property extends to the entire P vector; that is, P is linear (each of its elements 
is a linear function of Y, the dependent variable). £(P) = P, that is, the expected value of 
each element of P is equal to the corresponding element of the true P, and in the class of all 
linear unbiased estimators of P, the OLS estimator P has minimum variance. 

The proof is given in Appendix CA, Section CA.4. As stated in the introduction, the In¬ 
variable case is in most cases a straight extension of the two- and three-variable cases. 


C.4 The Coefficient of Determination R 2 in Matrix Notation 


The coefficient of determination R 2 has been defined as 


In the two-variable case, 


and in the three-variable case 


R 2 


ESS 

TSS 


R 2 = 


PlE*f 

Eyf 


h EfiU + h 

El 2 


Generalizing, we obtain for the ^-variable case 

n 2 _ P2 E YiXii + Pi E yixii + --- + PkEyixki 

Eyf 

By using Eqs. (C.3.16) and (C.3.17), Eq. (C.4.1) can be written as 


(3.5.6) 


(7.5.5) 


(C.4.1) 


P'X'y 

y'y-nY 2 


(C.4.2) 


which gives the matrix representation of R 2 . 

For our illustrative example, 

P'X'y = [24.3571 0.5079] [ 20 5*500 ] 
= 131,409.831 
y'y= 132,100 
and 


nY 2 = 123,210 

Plugging these values into Eq. (C.4.2), we see that R 2 — 0.9224, which is about the same as 
obtained before, save for the rounding errors. 
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C.5 The Correlation Matrix 


In the previous chapters we came across the zero-order, or simple, correlation coefficients 
r\ 2, r\ 3, r23, and the partial, or first-order, correlations ri2.3, ri3.2, ^23.1, and their interrela¬ 
tionships. In the ^-variable case, we shall have in all k(k — l)/2 zero-order correlation 
coefficients. (Why?) These k(k — l)/2 correlations can be put into a matrix, called the 
correlation matrix R as follows: 


>11 

R= 

.m 


r 12 r i3 r xk 

r 21 r23 ... r2k 


rta n 3 ••• r kk _ 


1 n 2 r n ■■■ r\k 
r 2 1 1 m ■■■ r 2k 


_ r k\ n 2 r k 3 • ■ ■ 1 . 


(C.5.1) 


where the subscript 1, as before, denotes the dependent variable Y (r i2 means correla¬ 
tion coefficient between Y and X 2 , and so on) and where use is made of the fact that 
the coefficient of correlation of a variable with respect to itself is always 1 (rn m 
r 22 = ■ ■ ■ = r kk = 1). 

From the correlation matrix R one can obtain correlation coefficients of first order (see 
Chapter 7) and of higher order such as r\ 2.34..,/c- (See Exercise C.4.) Many computer pro¬ 
grams routinely compute the R matrix. We have used the correlation matrix in Chapter 10. 

C.6 Hypothesis Testing about Individual Regression 
Coefficients in Matrix Notation 

For reasons spelled out in the previous chapters, if our objective is inference as well as 
estimation, we shall have to assume that the disturbances u t follow some probability distri¬ 
bution. Also for reasons given previously, in regression analysis we usually assume that 
each u, follows the normal distribution with zero mean and constant variance a 2 . In matrix 
notation, we have 

u ~ N(0, ct 2 I) (C.6.1) 

where u and 0 are n x 1 column vectors and I is an n x n identity matrix, 0 being the null 

vector. 

Given the normality assumption, we know that in two- and three-variable linear regres¬ 
sion models (1) the OLS estimators /3, and the ML estimators /3, are identical, but the ML 
estimator cr 2 is biased, although this bias can be removed by using the unbiased OLS esti¬ 
mator <7 2 ; and (2) the OLS estimators /l, are also normally distributed. Generalizing, in the 
^-variable case we can show that 


P ~ V[P, cr 2 (X'X) _1 ] (C.6.2) 

that is, each element of P is normally distributed with mean equal to the corresponding 
element of true P and the variance given by a 2 times the appropriate diagonal element of 
the inverse matrix (X'X) -1 . 
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Since in practice cr 2 is unknown, it is estimated by <r 2 . Then by the usual shift to the t 
distribution, it follows that each element of P follows the t distribution with n — k df. 
Symbolically, 


(C.6.3) 


with n — k df, where P, is any element of p. 

The t distribution can therefore be used to test hypotheses about the true Pi as well as to 
establish confidence intervals about it. The actual mechanics have already been illustrated 
in Chapters 5 and 8. For a fully worked example, see Section C.10. 


C.7 Testing the Overall Significance of Regression: 

Analysis of Variance in Matrix Notation 

In Chapter 8 we developed the ANOVA technique (1) to test the overall significance of the 
estimated regression, that is, to test the null hypothesis that the true (partial) slope coeffi¬ 
cients are simultaneously equal to zero, and (2) to assess the incremental contribution of an 
explanatory variable. The ANOVA technique can be easily extended to the ^-variable case. 
Recall that the ANOVA technique consists of decomposing the TSS into two components: 
the ESS and the RSS. The matrix expressions for these three sums of squares are already 
given in Eqs. (C.3.16), (C.3.17), and (C.3.18), respectively. The degrees of freedom asso¬ 
ciated with these sums of squares are n — 1 ,k — 1, and n — k, respectively. (Why?) Then, 
following Chapter 8, Table 8.1, we can set up Table C.2. 

Assuming that the disturbances u, are normally distributed and the null hypothesis is 
p 2 = fa = ■ ■ ■ — Pk = 0, and following Chapter 8, one can show that 


(P'X'y — nY 2 )/(k — 1) 
(y'y - P'X'y)/(« - k) 


(C.7.1) 


follows the F distribution with k — 1 and n — k df. 

In Chapter 8 we saw that, under the assumptions stated previously, there is a close rela¬ 
tionship between F and if 2 , namely, 


R 2 /(k~ 1) 

(1 _*»)/(„_*) 


(8.4.11) 


Therefore, the ANOVA Table C.2 can be expressed as Table C.3. One advantage of 
Table C.3 over Table C.2 is that the entire analysis can be done in terms of if 2 ; one need not 
consider the term (y'y — riY 2 ), for it drops out in the F ratio. 


TABLE C.2 
Matrix Formulation 
of the ANOVA Table 
for k- Variable Linear 
Regression Model 


Source of Variation 

SS 

df 

MSS 

Due to regression 

P'X'y -n? 2 

/c-1 

pX'y — n? 2 

(that is, due to X2, X3,.. 

■ ,X k ) 


k- 1 




y'y - P'X'y 

Due to residuals 

y'y - p'X'y 

n-k 

n-k 

Total 

y'y — n? 2 

n- 1 
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TABLE C.3 

k- Variable ANOVA 
Table in Matrix Form 
in Terms of R 2 


Source of Variation 

SS 

df 

MSS 

Due to regression 

R 2 (y'y - nY 2 ) 

* — 1 

R 2 (yy-nY Z ) 

(that is, due to Xz, Xz,.. 

Due to residuals 

;X k ) 

0-R 2 Wy-n? 2 ) 

n-k 

0 - R 2 )(y'y - n? 2 ) 

n-k 

Total 

y'y — n? 2 

n- 1 



C.8 Testing Linear Restrictions: General F Testing 
Using Matrix Notation 

In Section 8.6 we introduced the general F test to test the validity of linear restrictions 
imposed on one or more parameters of the ^-variable linear regression model. The appro¬ 
priate test was given in (8.6.9) (or its equivalent, Eq. [8.6.10]). The matrix counterpart of 
(8.6.9) can be easily derived. 

Let 

Ur = the residual vector from the restricted least-squares regression 
uur = the residual vector from the unrestricted least-squares regression 
Then 

u' R U r = Y. Mr = RSS from the restricted regression 
u[j R uur = X! «ur = RSS from the unrestricted regression 
m — number of linear restrictions 

k — number of parameters (including the intercept) in the unrestricted regression 
n — number of observations 
The matrix counterpart of Eq. (8.6.9) is then 

F _ (“r% ~ ”ur"ur)/ w (C.8.1) 

(“ur“ur)/( m - £) 

which follows the F distribution with (m, n — k) df. As usual, if the computed F value from 
Eq. (C.8.1) exceeds the critical F value, we can reject the restricted regression; otherwise, 
we do not reject it. 

C.9 Prediction Using Multiple Regression: Matrix Formulation 

In Section 8.8 we discussed, using scalar notation, how the estimated multiple regression 
can be used for predicting (1) the mean and (2) individual values of Y, given the values of 
the X regressors. In this section we show how to express these predictions in matrix form. 
We also present the formulas to estimate the variances and standard errors of the predicted 
values; in Chapter 8 we noted that these formulas are better handled in matrix notation, for 
the scalar or algebraic expressions of these formulas become rather unwieldy. 


1 


Xq2 


X ok , 


Mean Prediction 

Let 


X„ = 


(C.9.1) 
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be the vector of values of the X variables for which we wish to predict Yq, the mean predic¬ 
tion of Y. 

Now the estimated multiple regression, in scalar form, is 

% = Pi + PiX 2i + fhx 3l + ■ ■ • + p k X ki + Ui (C.9.2) 
which in matrix notation can be written compactly as 

Y,=x;p (C.9.3) 

where xj = [1 X 2i X 3l ■ ■ ■ X ki ] and 

"A" 



J k . 


Equation (C.9.2) or (C.9.3) is of course the mean prediction of Y t corresponding to 
given x[. 

If x; is as given in Eq. (C.9.1), Eq. (C.9.3) becomes 

| x6) = x&p (C.9.4) 

where, of course, the values of xo are specified. Note that Eq. (C.9.4) gives an unbiased 
prediction of E(Y t | xo), since E(x oP) = xofl. (Why?) 

Variance of Mean Prediction 

The formula to estimate the variance of (To I x o) is as follows: 7 

var(7 0 | x&) = a 2 x&(X'X)- 1 x 0 (C.9.5) 

where o 2 is the variance of u t , xo are the given values of the X variables for which we wish 
to predict, and (X'X) is the matrix given in Eq. (C.3.9). In practice, we replace a 2 by its 
unbiased estimator a 1 . 

We will illustrate mean prediction and its variance in the next section. 


Individual Prediction 

As pointed out in Chapters 5 and 8, the individual prediction of Y (= Y 0 ) is also given by 
Eq. (C.9.3) or more specifically by Eq. (C.9.4). The difference between mean and individ¬ 
ual predictions lies in their variances. 

Variance of Individual Prediction 

The formula for the variance of an individual prediction is as follows: 8 

var(7 0 | x 0 ) = a 2 [l + x^X'XT'xq] (C.9.6) 

where var ( Y 0 | x 0 ) stands for E[Y 0 - Y 0 | X] 2 . In practice we replace a 1 by its unbiased 
estimator <r 2 . We illustrate this formula in the next section. 


7 For derivation, see J. Johnston, Econometrics Methods, McGraw-Hill, 3d ed., New York, 1984, 
pp. 195-196. 

8 lbid. 
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C.10 Summary of the Matrix Approach: An Illustrative Example 


Consider the data given in Table C.4. These data pertain to per capita personal consumption 
expenditure (PPCE) and per capital personal disposable income (PPDI) and time or the 
trend variable. By including the trend variable in the model, we are trying to find out the 
relationship of PPCE to PPDI net of the trend variable (which may represent a host of other 
factors, such as technology, change in tastes, etc.). 

For empirical purposes, therefore, the regression model is 

Yi = ft + foXn + hXu + ut (C. 10.1) 

where Y — per capita consumption expenditure, X 2 — per capita disposable income, and 
Xi — time. The data required to run the regression (C. 10.1) are given in Table C.4. 

In matrix notation, our problem may be shown as follows: 


1673 ~ 


"l 1839 l" 


" Ml " 

1688 


1 1844 2 


M2 

1666 


1 1831 3 


m 3 

1735 


1 1881 4 


m 4 

1749 


1 1883 5 


Us 

1756 


1 1910 6 




M6 

1815 


1 1969 7 


7i" 


m 7 

1867 

= 

1 2016 8 


h 

+ 

MS 

1948 


1 2126 9 


J 3 _ 


m 9 

2048 


1 2239 10 




MlO 

2128 


1 2336 11 


Mil 

2165 


1 2404 12 


U\2 

2257 


1 2487 13 


«13 

2316 


1 2535 14 


Ml4 

2324 


.1 2595 15_ 


_Ml5_ 


(C.10.2) 


TABLE C.4 

Per Capita Personal 

PPCE, Y 

PPDI, X 2 

Time, X 3 

PPCE, Y 

PPDI, X 2 

Time, X 3 

Consumption 

1673 

1839 

1 ( = 1956) 

1948 

2126 

9 

Expenditure (PPCE) 

1688 

1844 

2 

2048 

2239 

10 

and Per Capita 

1666 

1831 

3 

2128 

2336 

11 

Personal Disposable 

1735 

1881 

4 

2165 

2404 

12 

Income (PPDI) in the 

1749 

1883 

5 

2257 

2487 

13 

United States, 

1756 

1910 

6 

2316 

2535 

14 

1956-1970, in 1958 

1815 

1969 

7 

2324 

2595 

15 ( = 1970) 

Dollars 

1867 

2016 

8 





Source: Economic Report of 
the President, January 1972, 
Table B-16. 
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From the preceding data we obtain the following quantities: 

Y = 1942.333 X 2 = 2126.333 X 2 = 8.0 
^2(Yi - Y) 2 = 830,121.333 


- x 2j 

I 2 = 1,103,111.333 


-X 3 ) 2 = 

280.0 





"l X 2 1 

X 3 1 _ 


1 1 

1 

1 

1 x 22 

*32 

XX = 

X 21 X 22 

x 23 • 

•• X 2n 

1 x 23 

*33 


_ X 2 1 X 32 

X 33 . 

•• X 3n _ 







_1 X 2n 

X 3n _ 


X'y = 


H X 2i 

E X 3i 

15 

31,895 

120 


E X 2i 
J2 X 2i 
Y, X 2 i X 3i 

31,895 

68,922.513 

272,144 


E*« 

E X 2i X 3i 

E X 3i 
120 ' 
272,144 
1240 


29,135 

62,905,821 

247,934 


(C.10.3) 


(C.10.4) 


Using the rules of matrix inversion given in Appendix B, one can see that 


(X'X)- 1 

Therefore, 


37.232491 

-0.0225082 

1.336707 


-0.0225082 

0.0000137 

-0.0008319 


1.336707 
-0.0008319 
0.054034 . 


(C.10.5) 


P = (X'X)-'X'y = 


300.28625 
0.74198 
8.04356 _ 


(C.10.6) 


The residual sum of squares can now be computed as 

29,135 " 
62,905,821 
247,934 _ 

(C.10.7) 

whence we obtain 

a 1 = = 164.73797 (C.10.8) 

The variance-covariance matrix for p can therefore be shown as 


= 57,420,003 - [300.28625 0.74198 8.04356] 


var-cov(P) = <r 2 (X'X) 1 


6133.650 -3.70794 220.20634 

-3.70794 0.00226 -0.13705 

220.20634 -0.13705 8.90155. 


(C.10.9) 
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The diagonal elements of this matrix give the variances of Pi, Pi, and Pi, respectively, and 
their positive square roots give the corresponding standard errors. 

From the previous data, it can be readily verified that 

ESS: P'X'y — nY 2 = 828,144.47786 (C.10.10) 

TSSij^y — nY 2 = 830,121.333 (C.10.11) 

Therefore, 

r2 

y’y-nY 2 


828,144.47786 

830,121.333 


(CIO. 12) 


= 0.99761 


Applying Eq. (7.8.4) the adjusted coefficient of determination can be seen to be 

R 2 = 0.99722 (C.10.13) 


Collecting our results thus far, we have 


% = 300.28625 + 0.74198Jf 2 , + 8.04356X 3i 
(78.31763) (0.04753) (2.98354) 

t= (3.83421) (15.60956) (2.69598) 

R 2 = 0.99761 R 2 = 0.99722 df = 12 


(C.10.14) 


The interpretation of Eq. (C.10.14) is this: If both Xi and X 3 are fixed at zero value, the 
average value of per capita personal consumption expenditure is estimated at about $300. 
As usual, this mechanical interpretation of the intercept should be taken with a grain of salt. 
The partial regression coefficient of 0.74198 means that, holding all other variables con¬ 
stant, an increase in per capita income of, say, a dollar is accompanied by an increase in the 
mean per capita personal consumption expenditure of about 74 cents. In short, the marginal 
propensity to consume is estimated to be about 0.74, or 74 percent. Similarly, holding all 
other variables constant, the mean per capita personal consumption expenditure increased 
at the rate of about $8 per year during the period of the study, 1956-1970. The R 2 value of 
0.9976 shows that the two explanatory variables accounted for over 99 percent of the 
variation in per capita consumption expenditure in the United States over the period 
1956-1970. Although R 2 dips slightly, it is still very high. 

Turning to the statistical significance of the estimated coefficients, we see from 
Eq. (C.10.14) that each of the estimated coefficients is individually statistically significant 
at, say, the 5 percent level of significance: The ratios of the estimated coefficients to their 
standard errors (that is, t ratios) are 3.83421, 15.61077, and 2.69598, respectively. Using a 
two-tail t test at the 5 percent level of significance, we see that the critical t value for 12 df 
is 2.179. Each of the computed t values exceeds this critical value. Hence, individually 
we may reject the null hypothesis that the true population value of the relevant coefficient 
is zero. 

As noted previously, we cannot apply the usual t test to test the hypothesis that 
$ 2 = Pi = 0 simultaneously because the f-test procedure assumes that an independent sam¬ 
ple is drawn every time the t test is applied. If the same sample is used to test hypotheses 
about P 2 and Pi simultaneously, it is likely that the estimators Pi and Pi are correlated, thus 
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TABLE C.5 
The ANOVA Table 
for the Data of 
Table C.4 


Source of Variation 

SS 

df 

MSS 

Due to X 2 , X3 

828,144.47786 

2 

414,072.3893 

Due to residuals 

1,976.85574 

12 

164.73797 

Total 

830,121.33360 

14 



violating the assumption underlying the f-test procedure. 9 As a matter of fact, a look at the 
variance-covariance matrix off! given in Eq. (C.10.9) shows that the estimators f} 2 and ^3 are 
negatively correlated (the covariance between the two is -0.13705). Hence we cannot use 
the t test to test the null hypothesis that p 2 = Pi = 0. 

But recall that a null hypothesis like p 2 = Pi = 0, simultaneously, can be tested by the 
analysis of variance technique and the attendant F test, which were introduced in Chapter 8. 
For our problem, the analysis of variance table is Table C.5. Under the usual assumptions, 
we obtain 


414,072.3893 

164.73797 


= 2513.52 


(C.10.15) 


which is distributed as the F distribution with 2 and 12 df. The computed F value is obvi¬ 
ously highly significant; we can reject the null hypothesis that p 2 = Pi = 0, that is, that per 
capita personal consumption expenditure is not linearly related to per capita disposable 
income and trend. 

In Section C.9 we discussed the mechanics of forecasting, mean as well as individual. 
Assume that for 1971 the PPDI figure is $2,610 and we wish to forecast the PPCE corre¬ 
sponding to this figure. Then, the mean as well as individual forecast of PPCE for 1971 is 
the same and is given as 


(PPCE 1971 I PPDI 1971 , X 2 = 16) = x' 1971 p 


= [1 2610 16] 


300.28625 

0.74198 

8.04356 


= 2365.55 


(C.10.16) 


where use is made of Eq. (C.9.3). 

The variances of 7 i 97 i and lj 97 i, as we know from Section C.9, are different and are 
follows: 


var(7 197i ] xj 971 ) = CT 2 [xj 971 (X'X) 'x^i] 


= 164.73797(1 2610 16](X , X)" 1 


2610 

16. 


(C.10.17) 


where (X'X) 1 is as shown in Eq. (C.10.5). Substituting this into Eq. (C.10.17), the reader 
should verify that 

var(7 197 i | x' 1971 ) = 48.6426 (C.10.18) 


? See Section 8.4 for details. 
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and therefore 

se(7 1971 1 x' 1971 ) = 6.9744 

We leave it to the reader to verify, using Eq. (C.9.6), that 

var (yi9 7 i | x' 1971 ) = 213.3806 (C.10.19) 

and 


se (71971 1 x'1971) = 14.6076 

Note: var(7i97i | x' 1971 ) = £[7| 97 , - 7i 97 i | x' 1971 ] 2 . 

In Section C.5 we introduced the correlation matrix R. For our data, the correlation 
matrix is as follows: 


7 

7 r 1 

R= X 2 0.9980 
X 3 [ 0.9743 


X 2 X 3 
0.9980 0.9743' 
1 0.9664 

0.9664 1 


(C. 10.20) 


Note that in Eq. (C. 10.20) we have bordered the correlation matrix by the variables of the 
model so that we can readily identify which variables are involved in the computation of the 
correlation coefficient. Thus, the coefficient 0.9980 in the first row of matrix (C. 10.20) tells 
us that it is the correlation coefficient between 7 and X 2 (that is, ri 2 ). From the zero-order 
correlations given in the correlation matrix (C. 10.20) one can easily derive the first-order 
correlation coefficients. (See Exercise C.7.) 


C.ll Generalized Least Squares (GLS) 

On several occasions we have mentioned that OLS is a special case of GLS. To see this, 
return to Eq. (C.2.2). To take into account heteroscedastic variances (the elements on the 
main diagonal of Eq. [C.2.2]) and autocorrelations in the error terms (the elements off 
the main diagonal of Eq. [C.2.2]), assume that 

E(uu') = <r 2 V (C.11.1) 

where V is a known n x n matrix. 

Therefore, if our model is: 

y=Xp+u 

where E( u) = 0 and var-cov (u) = a 2 V. In case <r 2 is unknown, which is typically the case, 
V then represents the assumed structure of variances and covariances among the random 
errors u t . 

Under the stated condition of the variance-covariance of the error terms, it can be shown 
that 

pgis = (X'V'xr'X'VV (C.11.2) 

P gls is known as the generalized least-squares (GLS) estimator of p. 

It can also be shown that 

var-cov (P gls ) = ^(X'V'X)- 1 

It can be proved that P gls is the best linear unbiased estimator of p. 


(C.11.3) 
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If it is assumed that the variance of each error term is the same constant a 2 and the error 
terms are mutually uncorrelated, then the V matrix reduces to the identity matrix, as shown 
in Eq. (C.2.3). If the error terms are mutually uncorrelated but they have different (i.e., het- 
eroscedastic) variances, then the V matrix will be diagonal with the unequal variances 
along the main diagonal. Of course, if there is heteroscedasticity as well as autocorrelation, 
then the V matrix will have entries on the main diagonal as well as on the off diagonal. 

The real problem in practice is that we do not know a 2 as well as the true variances and 
covariances (i.e., the structure of the V matrix). As a solution, we can use the method of 
estimated (or feasible) generalized least squares (EGLS). Here we first estimate our 
model by OLS, disregarding the problems of heteroscedasticity and/or autocorrelation. We 
obtain the residuals from this model and form the (estimated) variance-covariance matrix 
of the error term by replacing the entries in the expression just before Eq. (C.2.2) by the 
estimated u, namely, u. It can be shown that EGLS estimators are consistent estimators of 
GLS. Symbolically, 


pegis _ (X'V- | X)- | (X , y- | y) (C.11.4) 

var-cov (p egls ) = er^X'V^X)- 1 (C.11.5) 


where V is an estimate of V. 

C.12 Summary and Conclusions 

The primary purpose of this appendix was to introduce the matrix approach to the classical 
linear regression model. Although very few new concepts of regression analysis were 
introduced, the matrix notation provides a compact method of dealing with linear regres¬ 
sion models involving any number of variables. 

In concluding this appendix, note that if the Y and X variables are measured in the devia¬ 
tion form, that is, as deviations from their sample means, there are a few changes in the for¬ 
mulas presented previously. These changes are listed in Table C.6. 10 As this table shows, in 


TABLE C.6 

k- Variable Regression 
Model in Original 

Units and in the 

Deviation Form* 

Original Units 

y = Xp + u 

P = (X'Xr’X'y 

(C.3.2) 

(C.3.11) 

Deviation Form 

y = Xp + u 

The column of 1 's in the X matrix 
drops out. (Why?) 

Same 


var-cov (p) = tr 2 (X'X) -1 

(C.3.13) 

Same 



u'u = y'y - P'X'y 

(C.3.18) 

Same 



£»* = y'y — n? 2 

(C.3.16) 

£ y 2 = y'y 

(C.12.1) 


ESS = P'X'y - n? 2 

(C.3.17) 

ESS = P'X'y 

(C.12.2) 


P'X'y - nY 2 

R ~ y'y - nY 2 

(C.4.2) 

y'y 

(C.12.3) 


"Note that although in both cases the symbols for the matrices and vectors are the same, in the deviation form the elements of 
the matrices and vectors are assumed to be deviations rather than the raw data. Note also that in the deviation form f) is of order 
k - 1 and the var-cov (@) is of order (k - 1 ){k - 1). 

10 ln these days of high-speed computers there may not be need for the deviation form. But it simpli¬ 
fies formulas and therefore calculations if one is working with a desk calculator and dealing with large 
numbers. 
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the deviation form the correction for mean n Y 2 drops out from the TSS and ESS. (Why?) 
This loss results in a change for the formula for R 2 . Otherwise, most of the formulas devel¬ 
oped in the original units of measurement hold true for the deviation form. 


C.l. For the illustrative example discussed in Section C. 10 the X'X and X'y using the data 
in the deviation form are as follows: 

1,103,111.333 16,984] 

16,984 280 J 

955,099.333] 

14,854.000 J 

a. Estimate ft and ft. 

b. How would you estimate ft? 

c. Obtain the variance of ft and ft and their covariances. 

d. Obtain R 2 and R 2 . 

e. Comparing your results with those given in Section C.10, what do you find are the 
advantages of the deviation form? 

C.2. Refer to Exercise 22.23. Using the data given therein, set up the appropriate (X'X) 
matrix and the X'y vector and estimate the parameter vector P and its variance- 
covariance matrix. Also obtain R 2 . How would you test the hypothesis that the elas¬ 
ticities of Ml with respect to GDP and interest rate R are numerically the same? 
C.3. Testing the equality of two regression coefficients. Suppose that you are given the 
following regression model: 

Yt =ft +ftJr 2/ +ft*3<+«i 

and you want to test the hypothesis that ft = ft. If we assume that the u t are nor¬ 
mally distributed, it can be shown that 

f = _ ft-ft _ 

■J var (ft) + var(ft) - 2cov(ft, ft) 

follows the t distribution with n — 3 df (see Section 8.5). (In general, for the k- 
variable case the df are n — k.) Therefore, the preceding t test can be used to test the 
null hypothesis ft = ft. 

Apply the preceding t test to test the hypothesis that the true values of ft and 
ft in the regression (C.10.14) are identical. 

Hint: Use the var-cov matrix of /J given in Eq. (C.10.9). 

C.4. Expressing higher-order correlations in terms of lower-order correlations. Correla¬ 
tion coefficients of order p can be expressed in terms of correlation coefficients of 
order p — 1 by the following reduction formula: 

D2.34S...(p-l) ~ [/lp.345...(p-l) r 2p.345...(p-l)] 

■/t 1 - r ?p.345...(p-l)]-/[ 1 _ r 2p.345...(p-l)] 


XX = 

X'y = 


H 2.345.../ 
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Thus, 


7*12 -713723 



as found in Chapter 7. 

You are given the following correlation matrix: 


Y 

X 2 
R= X 3 
X 4 

*5 


Y 


X 2 X 3 X 4 

0.44 -0.34 -0.31 
1 0.25 -0.19 

1 0.44 


* 5 

-0.14' 

-0.35 

0.33 

0.85 

1 


Find the following: 

a. 7-12.345 b. r 12 . 34 C. /"12.3 

d. 7-13.245 e. 7*i3.24 / 7-13.2 

C.5. Expressing higher-order regression coefficients in terms of lower-order regression 
coefficients. A regression coefficient of order p can be expressed in terms of a regression 
coefficient of order p — 1 by the following reduction formula: 


A2.345...O-I) - [A^345...0>-l)/^2.345...(p-l)] 
1 - P2p.345...(p-\)Pp2.345.,.(j>-l) 


Thus, 


Pi 2.3 = 


Pl2 ~ Pl3P32 
1 - P23P32 


where j3\ 2.3 is the slope coefficient in the regression of y on Y 2 holding^ constant. 
Similarly, ff\ 2.34 is the slope coefficient in the regression of Y on X 2 holding Y 3 and Y 4 
constant, and so on. 

Using the preceding formula, find expressions for the following regression 
coefficients in terms of lower-order regression coefficients: /J12.3456, Pi 2.345, and 
Pi 2.34- 

C.6. Establish the following identity: 

Pl2.3P23.lP3l.2 = 7-12.37-23.17-31.2 


C.7. For the correlation matrix R given in Eq. (C. 10.20) find all the first-order partial cor¬ 
relation coefficients. 

C.8. In studying the variation in crime rates in certain large cities in the United States, Og- 
hurn obtained the following data:* 

Y X 2 X 3 X 4 X 5 


y = 19.9 

Si = 7.9 

Y 

' 1 0.44 -0.34 -0.31 

-0.14" 

X 2 = 49.2 

S 2 = 1.3 

X 2 

1 0.25 -0.19 

-0.35 

X 3 = 10.2 

S 3 = 4.6 

R= X 3 

1 0.44 

0.33 

X 4 = 481.4 

S 4 = 74.4 

X 4 

1 

0.85 

X 5 = 41.6 

S 5 = 10.8 

Xs 


1 

Ogburn, "Factors 

in the Variation of Crime among Cities," journal of American Statistical 


Association, vol. 30, 1935, p. 12. 
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where Y — crime rate, number of known offenses per thousand of population 
X 2 — percentage of male inhabitants 

X3 = percentage of total inhabitants who are foreign-bom males 
X 4 = number of children under 5 years of age per thousand married women 
between ages 15 and 44 years 

X$ — church membership, number of church members 13 years of age and 
over per 100 of total population 13 years of age and over; Si to S5 are the 
sample standard deviations of variables Y through X$ and R is the corre¬ 
lation matrix 

a. Treating Y as the dependent variable, obtain the regression of Y on the four X vari¬ 
ables and interpret the estimated regression. 

b. Obtain r x 2.3, r x 4 35 , and r x 5,34. 

c. Obtain R 2 and test the hypothesis that all partial slope coefficients are simultane¬ 
ously equal to zero. 

C.9. The following table gives data on output and total cost of production of a commodity 
in the short run. (See Example 7.4.) 


Output Total Cost, $ 

1 193 

2 226 

3 240 

4 244 

5 257 

6 260 

7 274 

8 297 

9 350 

10 420 


To test whether the preceding data suggest the U-shaped average and marginal cost 

curves typically encountered in the short run, one can use the following model: 

Yi=P 1 + faX t + faX 2 + faXf + Ui 

where Y — total cost and X = output. The additional explanatory variables Xf and X] 

are derived from X. 

a. Express the data in the deviation form and obtain (X'X), (X'y), and (X'X) -1 . 

b. Estimate fa, fa, and fa. 

c. Estimate the var-cov matrix of (i. 

d. Estimate fa. Interpret fa in the context of the problem. 

e. Obtain R 2 and R 2 . 

f A priori, what are the signs of fa, fa, and fal Why? 

g. From the total cost function given previously obtain expressions for the marginal 
and average cost functions. 

h. Fit the average and marginal cost functions to the data and comment on the fit. 

i. If fa = fa — 0, what is the nature of the marginal cost function? How would you 
test the hypothesis that fa = fa = 0? 

j. How would you derive the total variable cost and average variable cost functions 
from the given data? 
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TABLE C.7 

Labor Force 
Participation 
Experience of the 


% in Labor 

Mean Family 

Mean Family 

Unemployment 

Tract No. 

Force, Y* 

Income, X 2 t 

Size, X 3 

Rate, X 4 * 

137 

64.3 

1,998 

2.95 

4.4 

Urban Poor: Census 

139 

45.4 

1,114 

3.40 

3.4 

Tracts, New York 

141 

26.6 

1,942 

3.72 

1.1 

City, 1970 

142 

87.5 

1,998 

4.43 

3.1 

Source: Census Tracts: New 

143 

71.3 

2,026 

3.82 

7.7 

York, Bureau of the Census, 

145 

82.4 

1,853 

3.90 

5.0 

U.S. Department of 

147 

26.3 

1,666 

3.32 

6.2 

Commerce, 1970. 

149 

61.6 

1,434 

3.80 

5.4 


151 

52.9 

1,513 

3.49 

12.2 


153 

64.7 

2,008 

3.85 

4.8 


155 

64.9 

1,704 

4.69 

2.9 


157 

70.5 

1,525 

3.89 

4.8 


159 

87.2 

1,842 

3.53 

3.9 


161 

81.2 

1,735 

4.96 

7.2 


163 

67.9 

1,639 

3.68 

3.6 


* Y = family heads under 65 years old. 

= dollars. 

^4 = percent of civilian labor force unemployed. 


C. 10. In order to study the labor force participation of urban poor families (families earn¬ 
ing less than $3,943 in 1969), the data in Table C.7 were obtained from the 1970 
Census of Population. 

a. Using the regression model Y, — fi\ + @2X21 + foX^ + + u iy obtain the 

estimates of the regression coefficients and interpret your results. 

b. A priori, what are the expected signs of the regression coefficients in the preced¬ 
ing model and why? 

c. How would you test the hypothesis that the overall unemployment rate has no 
effect on the labor force participation of the urban poor in the census tracts given 
in the accompanying table? 

d. Should any variables be dropped from the preceding model? Why? 

e. What other variables would you consider for inclusion in the model? 

C. 11. In an application of the Cobb-Douglas production function the following results 
were obtained: 

ffiT, = 2.3542 + 0.9576 In X 2i + 0.8242 lnX 3i 
(0.3022) (0.3571) 

R 2 = 0.8432 df= 12 

where Y — output, X2 — labor input, and X3 — capital input, and where the figures in 
parentheses are the estimated standard errors. 

a. As noted in Chapter 7, the coefficients of the labor and capital inputs in the pre¬ 
ceding equation give the elasticities of output with respect to labor and capital. 
Test the hypothesis that these elasticities are individually equal to unity. 

b. Test the hypothesis that the labor and capital elasticities are equal, assuming 

(i) the covariance between the estimated labor and capital coefficients is zero, and 

(ii) it is -0.0972. 

c. How would you test the overall significance of the preceding regression equation? 
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*C.12. Express the likelihood function for the ^-variable regression model in matrix nota¬ 
tion and show that P, the vector of maximum likelihood estimators, is identical to P, 
the vector of OLS estimators of the k- variable regression model. 

C.13. Regression using standardized variables. Consider the following sample regression 
functions (SRFs): 


Y i =p 1 +p 2 X 2i +p 3 X 3i +Ui 

(1) 

Y* — bi + b 2 X* 2i + b3X* M + u* 

(2) 


where 

S Y 

5 2 

53 

where the s ’s denote the sample standard deviations. As noted in Chapter 6, Sec¬ 
tion 6.3, the starred variables above are known as the standardized variables. These 
variables have zero means and unit (=1) standard deviations. Expressing all the 
variables in the deviation form, show the following for model (2): 



e. b\ — Q 

Also establish the relationship between the V s and the P’s. 

(Note that in the preceding relations n denotes the sample size; r\ 2 , n 3 , and r 2 3 
denote the correlations between Y and X 2 , between Y and A3, and between A and A3, 
respectively.) 

C.14. Verify Eqs. (C.10.18) and (C.10.19). 

C.15. Constrained least-squares. Assume 

y = XP + u (1) 

which we want to estimate subject to a set of equality restrictions or constraints: 

RP = r (2) 

where R is a known matrix of order qxk (q < k) and r is a known vector of q ele¬ 
ments. To illustrate, suppose our model is 

n rn h + fhX 2l + P 3 X 3i + £4 Xm + p 5 X 5i + Ui (3) 


'Optional. 
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and suppose we want to estimate this model subject to these restrictions: 


Pi ~ fc = 0 
Pa + fis = 1 


( 4 ) 


We can use some of the techniques discussed in Chapter 8 to incorporate these 
restrictions (e.g., P2 = P3 and Pa = 1 — Ps, thus removing P2 and Pa from the model) 
and test for the validity of these restrictions by the F test discussed there. But a more 
direct way of estimating Eq. (3) incorporating the restrictions (4) directly in the esti¬ 
mating procedure is first to express the restrictions in the form of Eq. (2), which in the 
present case becomes 


R = 


r° 1 

0 0 


-1 0 oi 
0 1 1J 



(5) 


Letting P* denote the restricted least-squares or constrained least-squares estimator, 

one can show that P* can be estimated by the following formula:* 

P* = P + (X'X) -1 R' [R(X'X)~ 1 R'r 1 (r - R) (6) 

where P is the usual (unconstrained) estimator estimated from the usual formula 

(X'X)-'X'y. 

a. What is the p vector in Eq. (3)? 

b. Given this P vector, verify that the R matrix and r vector given in Eq. (5) do in fact 
incorporate the restrictions in Eq. (4). 

c. Write down the R and r in the following cases: 

(0 p 2 = Pi = Pa =2 

(») Pi = Pi and Pa = Ps 

(Hi) Pi ~ 3 Pi = 5 Pa 

(tv) p 2 + 3p 3 = 0 

d. When will P* = ft? 


Appendix CA 


CA. 1 Derivation of k Normal or Simultaneous Equations 

Differentiating 

J2 “? = T. ( Yi - A - PiXii - hXki? 

partially with respect to P \, P2, ■ ■ ■, Pk, we obtain 

= 2V(ii -Pi-PiXn - faXuK- 1) 

3pi 

^ = 2 Y,( Y ‘ - ft - hz* - hXki)(-x 2i ) 

Opi 

’mtYjJt-fa-PiXki - PkX ki )(-X ki ) 

aPk 

Setting the preceding partial derivatives equal to zero and rearranging the terms, we obtain the k nor¬ 
mal equations given in Eq. (C.3.8). 


‘See J. johnston, op. cit., p. 205. 
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CA.2 Matrix Derivation of Normal Equations 

From Eq. (C.3.7) we obtain 

u'fi = y'y~2p'X'y + P'X'xp 

Using the rules of matrix differentiation given in Appendix B, Section B.6, we obtain 

Setting the preceding equation to zero gives 

(X'X)P = X'y 

whence P = (X'X) _1 X'y, provided the inverse exists. 


CA.3 Variance—Covariance Matrix of P 

From Eq. (C.3.11) we obtain 

P = (X'X)-'x'y 

Substituting y = Xp + u into the preceding expression gives 
P = (X'X) _1 X'(XP + u) 

= (X'X)- 1 X'X P + (X'X) _1 X'u (1) 

= p + (x'xr'x'u 

Therefore, 

P - P = (X'X)-'X'U (2) 

By definition 

var-cov(P)=£[(P-P)(P-P)'] 

= E {[(X'X) _ 1 X'u] [(X'X) _ 1 X'u]'} (3) 

= £[(X'X)- 1 X'uu'X(X'X) -1 ] 

where in the last step use is made of the fact that (AB)' = B'A'. 

Noting that theX’s are nonstochastic, on taking expectation of Eq. (3) we obtain 

var-cov(P) = (X , X)“ 1 X , £’(uu')X(X'X)“ 1 
= (X'X)~‘ X'rr 2 IX(X'X)~' 

= <t 2 (X , X)“ 1 

which is the result given in Eq. (C.3.13). Note that in deriving the preceding result use is made of the 
assumption that is(uu') = cr 2 I. 

CA.4 BLUE Property of OLS Estimators 

From Eq. (C.3.11) we have 

P = (X'X)- 1 X'y (1) 

Since (X'X) 'X' is a matrix of fixed numbers, P is a linear function of Y. Hence, by definition it is a 
linear estimator. 
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Recall that the PRF is 

y=xp + u (2) 

Substituting this into Eq. (1), we obtain 

P = (X'X)-'X'(XP + u) (3) 

-p + (X'X) 'x'u (4) 

since (X'X) _1 X'X = I. 

Taking expectation of Eq. (4) gives 

£(p) = £(P) + (X'X)- 1 X'£(u) 

= P (5) 

since £(P) = P (why?) and £(u) = 0 by assumption, which shows that P is an unbiased estimator 
of P- 

Let P* be any other linear estimator of P, which can be written as 

P* = [(X'X)-'x' + C]y (6) 

where C is a matrix of constants. 

Substituting for y from Eq. (2) into Eq. (6), we get 

P* = [(x'xr'x' + C](XP + u) 

= P + CXP + (X , X)“ 1 X'u + Cu (7) 

Now if P* is to be an unbiased estimator of P, we must have 

CX = 0 (Why?) (8) 

Using Eq. (8), Eq. (7) can he written as 

P*-P = (X'X)“ 1 X , u + Cu (9) 

By definition, the var-cov (P*) is 

£(P* - P)(P* - py = *[(X'X) 'x'u + Cu][(X'X) _1 X , u + Cu]' (10) 

Making use of the properties of matrix inversion and transposition and after algebraic simplification, 
we obtain 

var-cov (P*) = a 2 (X'X)~ l + a 2 CC' 

= var-cov (P) + a 2 C C (11) 

which shows that the variance-covariance matrix of the alternative unbiased linear estimator P* is 
equal to the variance-covariance matrix of the OLS estimator p plus a 2 times CC', which is a posi¬ 
tive semidefinite matrix. Hence the variances of a given element of P* must necessarily be equal to 
or greater than the corresponding element of P, which shows that P is BLUE. Of course, if C is a null 
matrix, i.e., C = 0, then P* = P, which is another way of saying that if we have found a BLUE esti¬ 
mator, it must be the least-squares estimator p. 


*See references in Appendix B. 
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TABLE D.l 

Areas Under the 
Standardized Normal 
Distribution 


Example 

Pr(0< Z< 1.96) = 0.4750 

Pr(Z > 1.96) = 0.5 - 0.4750 = 0.025 


0 1.96 




Note: This table gives the area in the right-hand tail of the distribution (i.e., Z > 0). But since the normal distribution is 
symmetrical about Z = 0, the area in the left-hand tail is the same as the area in the corresponding right-hand tail. For example, 
P(- 1.96 < Z < 0) = 0.4750. Therefore, P(- 1.96 < Z < 1.96) = 2(0.4750) = 0.95. 
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TABLE D.2 
Percentage Points of 
the t Distribution 

Source: From E. S. Pearson and 

Tables for Statisticians, vol. 1, 
3d ed., table 12, Cambridge 
University Press, New York, 
1966. Reproduced by 
permission of the editors and 
trustees of Biometrika. 


Example 

Pr(t> 2.086) = 0.025 

Pr(f> 1.725) = 0.05 fordf = 20 

Pr (| t| > 1.725) = 0.10 




Note: The smaller probability shown at the head of each column is the area in one tail; the larger probability is the area in 
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TABLE D.3 Upper Percentage Points of the F Distribution 
Example 

Pr(F > 1.59) = 0.25 
Pr(F >2.42) = 0.10 fordfN 1= 10 
Pr(F > 3.14) = 0.05 and N 2 = 9 
Pr(F >5.26) = 0.01 

0 


df for 
denom- 

n 2 

df for numerator Ni 

Pr 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 


.25 

5.83 

7.50 

8.20 

8.58 

8.82 

8.98 

9.10 

9.19 

9.26 

9.32 

9.36 

9.41 

1 

.10 

39.9 

49.5 

53.6 

55.8 

57.2 

58.2 

58.9 

59.4 

59.9 

60.2 

60.5 

60.7 


.05 

161 

200 

216 

225 

230 

234 

237 

239 

241 

242 

243 

244 


.25 

2.57 

3.00 

3.15 

3.23 

3.28 

3.31 

3.34 

3.35 

3.37 

3.38 

3.39 

3.39 

2 

.10 

8.53 

9.00 

9.16 

9.24 

9.29 

9.33 

9.35 

9.37 

9.38 

9.39 

9.40 

9.41 


.05 

18.5 

19.0 

19.2 

19.2 

19.3 

19.3 

19.4 

19.4 

19.4 

19.4 

19.4 

19.4 


.01 

98.5 

99.0 

99.2 

99.2 

99.3 

99.3 

99.4 

99.4 

99.4 

99.4 

99.4 

99.4 


.25 

2.02 

2.28 

2.36 

2.39 

2.41 

2.42 

2.43 

2.44 

2.44 

2.44 

2.45 

2.45 

3 

.10 

5.54 

5.46 

5.39 

5.34 

5.31 

5.28 

5.27 

5.25 

5.24 

5.23 

5.22 

5.22 


.05 

10.1 

9.55 

9.28 

9.12 

9.01 

8.94 

8.89 

8.85 

8.81 

8.79 

8.76 

8.74 


.01 

34.1 

30.8 

29.5 

28.7 

28.2 

27.9 

27.7 

27.5 

27.3 

27.2 

27.1 

27.1 


.25 

1.81 

2.00 

2.05 

2.06 

2.07 

2.08 

2.08 

2.08 

2.08 

2.08 

2.08 

2.08 

4 

.10 

4.54 

4.32 

4.19 

4.11 

4.05 

4.01 

3.98 

3.95 

3.94 

3.92 

3.91 

3.90 


.05 

7.71 

6.94 

6.59 

6.39 

6.26 

6.16 

6.09 

6.04 

6.00 

5.96 

5.94 

5.91 


.01 

21.2 

18.0 

16.7 

16.0 

15.5 

15.2 

15.0 

14.8 

14.7 

14.5 

14.4 

14.4 


.25 

1.69 

1.85 

1.88 

1.89 

1.89 

1.89 

1.89 

1.89 

1.89 

1.89 

1.89 

1.89 

5 

.10 

4.06 

3.78 

3.62 

3.52 

3.45 

3.40 

3.37 

3.34 

3.32 

3.30 

3.28 

3.27 


.05 

6.61 

5.79 

5.41 

5.19 

5.05 

4.95 

4.88 

4.82 

4.77 

4.74 

4.71 

4.68 


.01 

16.3 

13.3 

12.1 

11.4 

11.0 

10.7 

10.5 

10.3 

10.2 

10.1 

9.96 

9.89 


.25 

1.62 

1.76 

1.78 

1.79 

1.79 

1.78 

1.78 

1.78 

1.77 

1.77 

1.77 

1.77 

6 

.10 

3.78 

3.46 

3.29 

3.18 

3.11 

3.05 

3.01 

2.98 

2.96 

2.94 

2.92 

2.90 


.05 

5.99 

5.14 

4.76 

4.53 

4.39 

4.28 

4.21 

4.15 

4.10 

4.06 

4.03 

4.00 


.01 

13.7 

10.9 

9.78 

9.15 

8.75 

8.47 

8.26 

8.10 

7.98 

7.87 

7.79 

7.72 


.25 

1.57 

1.70 

1.72 

1.72 

1.71 

1.71 

1.70 

1.70 

1.69 

1.69 

1.69 

1.68 

7 

.10 

3.59 

3.26 

3.07 

2.96 

2.88 

2.83 

2.78 

2.75 

2.72 

2.70 

2.68 

2.67 


.05 

5.59 

4.74 

4.35 

4.12 

3.97 

3.87 

3.79 

3.73 

3.68 

3.64 

3.60 

3.57 


.01 

12.2 

9.55 

8.45 

7.85 

7.46 

7.19 

6.99 

6.84 

6.72 

6.62 

6.54 

6.47 


.25 

1.54 

1.66 

1.67 

1.66 

1.66 

1.65 

1.64 

1.64 

1.63 

1.63 

1.63 

1.62 

8 

.10 

3.46 

3.11 

2.92 

2.81 

2.73 

2.67 

2.62 

2.59 

2.56 

2.54 

2.52 

2.50 


.05 

5.32 

4.46 

4.07 

3.84 

3.69 

3.58 

3.50 

3.44 

3.39 

3.35 

3.31 

3.28 


.01 

11.3 

8.65 

7.59 

7.01 

6.63 

6.37 

6.18 

6.03 

5.91 

5.81 

5.73 

5.67 


.25 

1.51 

1.62 

1.63 

1.63 

1.62 

1.61 

1.60 

1.60 

1.59 

1.59 

1.58 

1.58 


.10 

3.36 

3.01 

2.81 

2.69 

2.61 

2.55 

2.51 

2.47 

2.44 

2.42 

2.40 

2.38 

9 

.05 

5.12 

4.26 

3.86 

3.63 

3.48 

3.37 

3.29 

3.23 

3.18 

3.14 

3.10 

3.07 


.01 

10.6 

8.02 

6.99 

6.42 

6.06 

5.80 

5.61 

5.47 

5.35 

5.26 

5.18 

5.11 


Source: From E. S. Pearson and H. O. Hartley, eds., Biometrika Tables for Statisticians, vol. 1, 3d ed., table 18, Cambridge University Press, New York, 1966. 
Reproduced by permission of the editors and trustees of Biometrika. 
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TABLE D.3 Upper Percentage Points of the F Distribution ( Continued ) 


df for 














denom- 

Ni 






df for numerator Ni 






Pr 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 


.25 

1.49 

1.60 

1.60 

1.59 

1.59 

1.58 

1.57 

1.56 

1.56 

1.55 

1.55 

1.54 

10 

.10 

3.29 

2.92 

2.73 

2.61 

2.52 

2.46 

2.41 

2.38 

2.35 

2.32 

2.30 

2.28 

.05 

4.96 

4.10 

3.71 

3.48 

3.33 

3.22 

3.14 

3.07 

3.02 

2.98 

2.94 

2.91 


.01 

10.0 

7.56 

6.55 

5.99 

5.64 

5.39 

5.20 

5.06 

4.94 

4.85 

4.77 

4.71 


.25 

1.47 

1.58 

1.58 

1.57 

1.56 

1.55 

1.54 

1.53 

1.53 

1.52 

1.52 

1.51 

11 

.10 

3.23 

2.86 

2.66 

2.54 

2.45 

2.39 

2.34 

2.30 

2.27 

2.25 

2.23 

2.21 

.05 

4.84 

3.98 

3.59 

3.36 

3.20 

3.09 

3.01 

2.95 

2.90 

2.85 

2.82 

2.79 


.01 

9.65 

7.21 

6.22 

5.67 

5.32 

5.07 

4.89 

4.74 

4.63 

4.54 

4.46 

4.40 


.25 

1.46 

1.56 

1.56 

1.55 

1.54 

1.53 

1.52 

1.51 

1.51 

1.50 

1.50 

1.49 

12 

.10 

3.18 

2.81 

2.61 

2.48 

2.39 

2.33 

2.28 

2.24 

2.21 

2.19 

2.17 

2.15 

.05 

4.75 

3.89 

3.49 

3.26 

3.11 

3.00 

2.91 

2.85 

2.80 

2.75 

2.72 

2.69 


.01 

9.33 

6.93 

5.95 

5.41 

5.06 

4.82 

4.64 

4.50 

4.39 

4.30 

4.22 

4.16 


.25 

1.45 

1.55 

1.55 

1.53 

1.52 

1.51 

1.50 

1.49 

1.49 

1.48 

1.47 

1.47 

13 

.10 

3.14 

2.76 

2.56 

2.43 

2.35 

2.28 

2.23 

2.20 

2.16 

2.14 

2.12 

2.10 

.05 

4.67 

3.81 

3.41 

3.18 

3.03 

2.92 

2.83 

2.77 

2.71 

2.67 

2.63 

2.60 


.01 

9.07 

6.70 

5.74 

5.21 

4.86 

4.62 

4.44 

4.30 

4.19 

4.10 

4.02 

3.96 


.25 

1.44 

1.53 

1.53 

1.52 

1.51 

1.50 

1.49 

1.48 

1.47 

1.46 

1.46 

1.45 

14 

.10 

3.10 

2.73 

2.52 

2.39 

2.31 

2.24 

2.19 

2.15 

2.12 

2.10 

2.08 

2.05 

.05 

4.60 

3.74 

3.34 

3.11 

2.96 

2.85 

2.76 

2.70 

2.65 

2.60 

2.57 

2.53 


.01 

8.86 

6.51 

5.56 

5.04 

4.69 

4.46 

4.28 

4.14 

4.03 

3.94 

3.86 

3.80 


.25 

1.43 

1.52 

1.52 

1.51 

1.49 

1.48 

1.47 

1.46 

1.46 

1.45 

1.44 

1.44 

15 

.10 

3.07 

2.70 

2.49 

2.36 

2.27 

2.21 

2.16 

2.12 

2.09 

2.06 

2.04 

2.02 

.05 

4.54 

3.68 

3.29 

3.06 

2.90 

2.79 

2.71 

2.64 

2.59 

2.54 

2.51 

2.48 


.01 

8.68 

6.36 

5.42 

4.89 

4.56 

4.32 

4.14 

4.00 

3.89 

3.80 

3.73 

3.67 


.25 

1.42 

1.51 

1.51 

1.50 

1.48 

1.47 

1.46 

1.45 

1.44 

1.44 

1.44 

1.43 

16 

.10 

3.05 

2.67 

2.46 

2.33 

2.24 

2.18 

2.13 

2.09 

2.06 

2.03 

2.01 

1.99 

.05 

4.49 

3.63 

3.24 

3.01 

2.85 

2.74 

2.66 

2.59 

2.54 

2.49 

2.46 

2.42 


.01 

8.53 

6.23 

5.29 

4.77 

4.44 

4.20 

4.03 

3.89 

3.78 

3.69 

3.62 

3.55 


.25 

1.42 

1.51 

1.50 

1.49 

1.47 

1.46 

1.45 

1.44 

1.43 

1.43 

1.42 

1.41 

17 

.10 

3.03 

2.64 

2.44 

2.31 

2.22 

2.15 

2.10 

2.06 

2.03 

2.00 

1.98 

1.96 

.05 

4.45 

3.59 

3.20 

2.96 

2.81 

2.70 

2.61 

2.55 

2.49 

2.45 

2.41 

2.38 


.01 

8.40 

6.11 

5.18 

4.67 

4.34 

4.10 

3.93 

3.79 

3.68 

3.59 

3.52 

3.46 


.25 

1.41 

1.50 

1.49 

1.48 

1.46 

1.45 

1.44 

1.43 

1.42 

1.42 

1.41 

1.40 


.10 

3.01 

2.62 

2.42 

2.29 

2.20 

2.13 

2.08 

2.04 

2.00 

1.98 

1.96 

1.93 

18 

.05 

4.41 

3.55 

3.16 

2.93 

2.77 

2.66 

2.58 

2.51 

2.46 

2.41 

2.37 

2.34 


.01 

8.29 

6.01 

5.09 

4.58 

4.25 

4.01 

3.84 

3.71 

3.60 

3.51 

3.43 

3.37 


.25 

1.41 

1.49 

1.49 

1.47 

1.46 

1.44 

1.43 

1.42 

1.41 

1.41 

1.40 

1.40 

19 

.10 

2.99 

2.61 

2.40 

2.27 

2.18 

2.11 

2.06 

2.02 

1.98 

1.96 

1.94 

1.91 

.05 

4.38 

3.52 

3.13 

2.90 

2.74 

2.63 

2.54 

2.48 

2.42 

2.38 

2.34 

2.31 


.01 

8.18 

5.93 

5.01 

4.50 

4.17 

3.94 

3.77 

3.63 

3.52 

3.43 

3.36 

3.30 


.25 

1.40 

1.49 

1.48 

1.46 

1.45 

1.44 

1.43 

1.42 

1.41 

1.40 

1.39 

1.39 

20 

.10 

2.97 

2.59 

2.38 

2.25 

2.16 

2.09 

2.04 

2.00 

1.96 

1.94 

1.92 

1.89 

.05 

4.35 

3.49 

3.10 

2.87 

2.71 

2.60 

2.51 

2.45 

2.39 

2.35 

2.31 

2.28 


.01 

8.10 

5.85 

4.94 

4.43 

4.10 

3.87 

3.70 

3.56 

3.46 

3.37 

3.29 

3.23 
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TABLE D.3 Upper Percentage Points of the F Distribution ( Continued ) 


df for 














denom- 

n 2 






df for numerator N-\ 






Pr 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 


.25 

1.40 

1.48 

1.47 

1.45 

1.44 

1.42 

1.41 

1.40 

1.39 

1.39 

1.38 

1.37 

22 

.10 

2.95 

2.56 

2.35 

2.22 

2.13 

2.06 

2.01 

1.97 

1.93 

1.90 

1.88 

1.86 

.05 

4.30 

3.44 

3.05 

2.82 

2.66 

2.55 

2.46 

2.40 

2.34 

2.30 

2.26 

2.23 


.01 

7.95 

5.72 

4.82 

4.31 

3.99 

3.76 

3.59 

3.45 

3.35 

3.26 

3.18 

3.12 


.25 

1.39 

1.47 

1.46 

1.44 

1.43 

1.41 

1.40 

1.39 

1.38 

1.38 

1.37 

1.36 

24 

.10 

2.93 

2.54 

2.33 

2.19 

2.10 

2.04 

1.98 

1.94 

1.91 

1.88 

1.85 

1.83 

.05 

4.26 

3.40 

3.01 

2.78 

2.62 

2.51 

2.42 

2.36 

2.30 

2.25 

2.21 

2.18 


.01 

7.82 

5.61 

4.72 

4.22 

3.90 

3.67 

3.50 

3.36 

3.26 

3.17 

3.09 

3.03 


.25 

1.38 

1.46 

1.45 

1.44 

1.42 

1.41 

1.39 

1.38 

1.37 

1.37 

1.36 

1.35 

26 

.10 

2.91 

2.52 

2.31 

2.17 

2.08 

2.01 

1.96 

1.92 

1.88 

1.86 

1.84 

1.81 

.05 

4.23 

3.37 

2.98 

2.74 

2.59 

2.47 

2.39 

2.32 

2.27 

2.22 

2.18 

2.15 


.01 

7.72 

5.53 

4.64 

4.14 

3.82 

3.59 

3.42 

3.29 

3.18 

3.09 

3.02 

2.96 


.25 

1.38 

1.46 

1.45 

1.43 

1.41 

1.40 

1.39 

1.38 

1.37 

1.36 

1.35 

1.34 

28 

.10 

2.89 

2.50 

2.29 

2.16 

2.06 

2.00 

1.94 

1.90 

1.87 

1.84 

1.81 

1.79 

.05 

4.20 

3.34 

2.95 

2.71 

2.56 

2.45 

2.36 

2.29 

2.24 

2.19 

2.15 

2.12 


.01 

7.64 

5.45 

4.57 

4.07 

3.75 

3.53 

3.36 

3.23 

3.12 

3.03 

2.96 

2.90 


.25 

1.38 

1.45 

1.44 

1.42 

1.41 

1.39 

1.38 

1.37 

1.36 

1.35 

1.35 

1.34 

30 

.10 

2.88 

2.49 

2.28 

2.14 

2.05 

1.98 

1.93 

1.88 

1.85 

1.82 

1.79 

1.77 

.05 

4.17 

3.32 

2.92 

2.69 

2.53 

2.42 

2.33 

2.27 

2.21 

2.16 

2.13 

2.09 


.01 

7.56 

5.39 

4.51 

4.02 

3.70 

3.47 

3.30 

3.17 

3.07 

2.98 

2.91 

2.84 


.25 

1.36 

1.44 

1.42 

1.40 

1.39 

1.37 

1.36 

1.35 

1.34 

1.33 

1.32 

1.31 

40 

.10 

2.84 

2.44 

2.23 

2.09 

2.00 

1.93 

1.87 

1.83 

1.79 

1.76 

1.73 

1.71 

.05 

4.08 

3.23 

2.84 

2.61 

2.45 

2.34 

2.25 

2.18 

2.12 

2.08 

2.04 

2.00 


.01 

7.31 

5.18 

4.31 

3.83 

3.51 

3.29 

3.12 

2.99 

2.89 

2.80 

2.73 

2.66 


.25 

1.35 

1.42 

1.41 

1.38 

1.37 

1.35 

1.33 

1.32 

1.31 

1.30 

1.29 

1.29 

60 

.10 

2.79 

2.39 

2.18 

2.04 

1.95 

1.87 

1.82 

1.77 

1.74 

1.71 

1.68 

1.66 

.05 

4.00 

3.15 

2.76 

2.53 

2.37 

2.25 

2.17 

2.10 

2.04 

1.99 

1.95 

1.92 


.01 

7.08 

4.98 

4.13 

3.65 

3.34 

3.12 

2.95 

2.82 

2.72 

2.63 

2.56 

2.50 


.25 

1.34 

1.40 

1.39 

1.37 

1.35 

1.33 

1.31 

1.30 

1.29 

1.28 

1.27 

1.26 

120 

.10 

2.75 

2.35 

2.13 

1.99 

1.90 

1.82 

1.77 

1.72 

1.68 

1.65 

1.62 

1.60 

.05 

3.92 

3.07 

2.68 

2.45 

2.29 

2.17 

2.09 

2.02 

1.96 

1.91 

1.87 

1.83 


.01 

6.85 

4.79 

3.95 

3.48 

3.17 

2.96 

2.79 

2.66 

2.56 

2.47 

2.40 

2.34 


.25 

1.33 

1.39 

1.38 

1.36 

1.34 

1.32 

1.31 

1.29 

1.28 

1.27 

1.26 

1.25 

200 

.10 

2.73 

2.33 

2.11 

1.97 

1.88 

1.80 

1.75 

1.70 

1.66 

1.63 

1.60 

1.57 

.05 

3.89 

3.04 

2.65 

2.42 

2.26 

2.14 

2.06 

1.98 

1.93 

1.88 

1.84 

1.80 


.01 

6.76 

4.71 

3.88 

3.41 

3.11 

2.89 

2.73 

2.60 

2.50 

2.41 

2.34 

2.27 


.25 

1.32 

1.39 

1.37 

1.35 

1.33 

1.31 

1.29 

1.28 

1.27 

1.25 

1.24 

1.24 


.10 

2.71 

2.30 

2.08 

1.94 

1.85 

1.77 

1.72 

1.67 

1.63 

1.60 

1.57 

1.55 

00 

.05 

3.84 

3.00 

2.60 

2.37 

2.21 

2.10 

2.01 

1.94 

1.88 

1.83 

1.79 

1.75 


.01 

6.63 

4.61 

3.78 

3.32 

3.02 

2.80 

2.64 

2.51 

2.41 

2.32 

2.25 

2.18 
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TABLE D.4 

Upper Percentage 
Points of the / 2 
Distribution 


Example 


25% area 


Pr(/ 2 > 10.85) = 0.95 

Pr( x 2 > 23.83) = 0.25 for df = 20 

Pr(x 2 > 31.41) = 0.05 


0 10.85 23.83 31.41 



*For df greater than 100 the expression y/(2k— 1)= Z follows the standardized normal distribution, where k represents 

the degrees of freedom. 
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.750 


.1015308 
.575364 
1.212534 
1.92255 
2.67460 
3.45460 
4.25485 
5.07064 
5.89883 
6.73720 
7.58412 
8.43842 
9.29906 
10.1653 
11.0365 
11.9122 
12.7919 
13.6753 
14.5620 
15.4518 
16.3444 
17.2396 
18.1373 
19.0372 
19.9393 
20.8434 
21.7494 
22.6572 
23.5666 
24.4776 
33.6603 
42.9421 
52.2938 
61.6983 
71.1445 
80.6247 
90.1332 


.500 


.454937 

1.38629 

2.36597 

3.35670 

4.35146 

5.34812 

6.34581 

7.34412 

8.34283 

9.34182 

10.3410 

11.3403 

12.3398 

13.3393 

14.3389 

15.3385 

16.3381 

17.3379 

18.3376 

19.3374 

20.3372 

21.3370 

22.3369 

23.3367 

24.3366 

25.3364 

26.3363 

27.3363 
28.3362 
29.3360 
39.3354 
49.3349 
59.3347 
69.3344 
79.3343 
89.3342 
99.3341 


.250 


1.32330 
2.77259 
4.10835 
5.38527 
6.62568 
7.84080 
9.03715 
10.2188 
11.3887 
12.5489 
13.7007 
14.8454 
15.9839 
17.1170 
18.2451 
19.3688 
20.4887 
21.6049 
22.7178 
23.8277 
24.9348 
26.0393 
27.1413 
28.2412 
29.3389 
30.4345 
31.5284 
32.6205 
33.7109 
34.7998 
45.6160 
56.3336 
66.9814 
77.5766 
88.1303 
98.6499 
109.141 


.100 


2.70554 
4.60517 
6.25139 
7.77944 
9.23635 
10.6446 
12.0170 
13.3616 
14.6837 
15.9871 
17.2750 
18.5494 
19.8119 
21.0642 
22.3072 
23.5418 
24.7690 
25.9894 
27.2036 
28.4120 
29.6151 
30.8133 
32.0069 
33.1963 
34.3816 
35.5631 
36.7412 
37.9159 
39.0875 
40.2560 
51.8050 
63.1671 
74.3970 
85.5271 
96.5782 
107.565 
118.498 


.050 


3.84146 
5.99147 
7.81473 
9.48773 
11.0705 
12.5916 
14.0671 
15.5073 
16.9190 
18.3070 
19.6751 
21.0261 
22.3621 
23.6848 
24.9958 
26.2962 
27.5871 
28.8693 
30.1435 
31.4104 
32.6705 
33.9244 
35.1 725 
36.4151 
37.6525 
38.8852 
40.1133 
41.3372 
42.5569 
43.7729 
55.7585 
67.5048 
79.0819 
90.5312 
101.879 
113.145 
124.342 


.025 


5.02389 
7.37776 
9.34840 
11.1433 
12.8325 
14.4494 
16.0128 
17.5346 
19.0228 
20.4831 
21.9200 
23.3367 
24.7356 
26.1190 
27.4884 
28.8454 
30.1910 
31.5264 
32.8523 
34.1696 
35.4789 
36.7807 
38.0757 
39.3641 
40.6465 
41.9232 
43.1944 
44.4607 
45.7222 
46.9792 
59.3417 
71.4202 
83.2976 
95.0231 
106.629 
118.136 
129.561 


.010 


6.63490 
9.21034 
11.3449 
13.2767 
15.0863 
16.8119 
18.4753 
20.0902 
21.6660 
23.2093 
24.7250 
26.2170 
27.6883 
29.1413 
30.5779 
31.9999 
33.4087 
34.8053 
36.1908 
37.5662 
38.9321 
40.2894 
41.6384 
42.9798 
44.3141 
45.6417 
46.9630 
48.2782 
49.5879 
50.8922 
63.6907 
76.1539 
88.3794 
100.425 
112.329 
124.116 
135.807 


.005 


7.87944 

10.5966 

12.8381 

14.8602 

16.7496 

18.5476 

20.2777 

21.9550 

23.5893 

25.1882 

26.7569 

28.2995 

29.8194 

31.3193 

32.8013 

34.2672 

35.7185 

37.1564 

38.5822 

39.9968 

41.4010 

42.7956 

44.1813 

45.5585 

46.9278 

48.2899 

49.6449 

50.9933 

52.3356 

53.6720 

66.7659 

79.4900 

91.9517 

104.215 

116.321 

128.299 

140.169 


Source: Abridged from E. S. Pearson and H. O. Hartley, eds., Biometrika Tables for Statisticians, vol. 1, 3d ed., table 8, Cambridge Univers 


sity Press, New York, 1966. 
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TABLE D.5A Durbin-Watson d Statistic: Significance Points of d[ and du at 0.05 Level of Significance 
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*'= 1 

*' = 12 

*-=1 


*'=14 

*'=15 

Ar' = 16 

*'=17 

*'=18 

*'= 1! 

> 

*' = 21 

) 

n 

d L 

dy d L 

du 

d L 

du 

d L 

du 

d L 

du 

d L 

du 

d L du 

d L 

du 

d L 

du 

d L 

du 

16 0 

098 3 

503 

















18 0 

138 3 

378 0.087 3 

557 

n7 „ , 















20 0 

263 3 

063 0.200 3 

234 0 

145 3 

395 0 

100 3 

542 0 

063 3, 

676 










21 0 

307 2 

976 0.240 3 

™ ° 

182 3 

300 0 

132 3 

448 0 

091 3 

583 0 

058 3 

705 








23 0 

391 2 

826 0.322 2 

979 0 

259 3 

128 0 

202 3 

272 0 

153 3 

409 0 

110 3 

535 0 

076 3^650 0. 

048 3 

.753 

_ 

_ 

_ 

_ 

24 0 

431 2 

761 0.362 2 

908 0 

297 3 

053 0 

239 3 

193 0 

186 3 

327 0 

141 3 

454 0 

101 3.572 0 

070 3 

678 0 

044 3 

773 

— 

— 

25 0 

470 2 

702 0.400 2 

844 0 

335 2 

983 0 

275 3 

119 0 

221 3 

251 0 

172 3 

376 0 

130 3.494 0 

094 3 

604 0 

065 3 

702 0 

041 3 

790 

27 0 

544 2 

600 0.475 2 

730 0 

409 2 

859 0 

348 2 

987 0 

291 3 

112 0 

238 3 

233 0 

191* 3349 0 

149 3 

460 0 

?12 3 

563 0 

08? 3 

658 

28 0 

578 2 

555 0.510 2 

680 0 

445 2 

805 0 

383 2 

928 0 

325 3 

050 0 

271 3 

168 0 

222 3.283 0 

178 3 

392 0 

138 3 

495 0 

104 3 

592 

29 0 

612 2 

515 0.544 2 

634 0 

512 2 

755 0 

418 2 

874 0 

359 2 

992 0 

305 3 

107 0 

254 3.219 0 

208 3 

327 0 

166 3 

431 0 

129 3 

528 

32 0 

703 2 

411 0.638 2 

517 0 

576 2 

625 0 

515 2 

733 0 

457 2 

840 0 

370 2 

946 0 

349 3.050 0 

299 3 

153 0 

253 3 

252 0 

211 3 

348 

33 0 

731 2 

382 0.668 2 

484 0 

606 2 

588 0 

546 2 

692 0 

488 2 

796 0 

432 2 

899 0 

379 3.000 0 

329 3 

100 0 

283 3 

198 0 

239 3 

293 

35 0 

783 2 

330 0.722 2 

425 0 

662 2 

521 0 

604 2 

619 0 

547 2 

716 0 

492 2 

813 0 

439 2*910 0 

388 3 

005 0 

340 3 

099 0 

295 3 

190 

36 0 

808 2 

306 0.748 

398 0 

689 2 

492 0 

631 

586 0 

575 2 

680 0 

520 2 

774 0 

467 2.868 0 

417 2 

961 0 

369 3 

053 0 

323 3 

142 

37 0 

831 2 

285 0.772 2 

374 0 

714 2 

464 0 

657 2 

555 0 

602 2 

646 0 

548 2 

738 0 

495 2.829 0 

445 2 

920 0 

397 3 

009 0 

351 3 

097 

39 0 

875 2 

246 0^819 2 

329 0 

763 2 

413 0 

707 2 

499 0 

653 2 

585 0 

600 2 

671 0 

549 2757 0 

499 2 

843 0 

451 2 

929 0 

404 3 

013 

40 0 

896 2 

228 0.840 2 

309 0 

785 2 

391 0 

731 2 

473 0 

678 2 

557 0 

626 2 

641 0 

575 2.724 0 

525 2 

808 0 

477 2 

892 0 

430 2 

974 

45 0 

988 2 

156 0.938 2 

225 0 

887 2 

296 0 

838 2 

367 0 

788 2 

439 0 

740 2 

512 0 

692 2.586 0 

644 2 

659 0 

598 2 

733 0 

553 2 

807 

50 1 

064 2 

103 1.019 2 

163 0 

973 2 

225 0 

927 2 

287 0 

882 2 

350 0 

836 2 

414 0 

792 2.479 0 

747 2 

544 0 

703 2 

610 0 

660 2 

675 

60 1 

184 2 

031 1.145 2 

079 1 

106 2 

127 1 

068 2 

177 1 

029 2 

227 0 

990 2 

278 0 

951 2.330 0 

913 2 

382 0 

874 2 

434 0 

836 2 

487 

65 1 

231 2 

006 1.195 2 

049 1 

160 2 

093 1 

124 2 

138 1 

088 2 

183 1 

052 2 

229 1 

016 2.276 0 

980 2 

323 0 

944 2 

371 0 

908 2 

419 

80 1 

340 1 

957 1.311 1 

991 1 

283 2 

024 1 

253 2 

059 1 

224 2 

093 1 

195 2 

129 1 

165 2.165 1 

136 2 

201 1 

106 2 

238 1 

076 2 

275 

85 1 

395 1 

946 1.342 1 

937 1.369 1 

977 1 

315 2 

009 1 

287 2 

040 1 

260 2 

073 1 

232 2 

105 1 

205 2.139 1 

177 2 

172 1 

149 2 

206 1 

121 2 

241 

100 1 

439 1 

929 1.394 1 

923 1.416 1 

948 1 

393 1 

974 1 

371 2 

000 1 

347 2 

026 1 

324 2 

053 1 

301 2.080 1 

277 2 

108 1 

253 2 

135 1 

229 2 

164 

150 1 

579 1 

892 1.564 1 

908 1 

550 1 

924 1 

535 1 

940 1 

519 1 

956 1 

504 1 

972 1 

489 1.989 1 

474 2 

006 1 

458 2 

023 1 

443 2 

99? 

Note: n = 

number 

of observations, k! = r 

mmber 

of expla 

inatory 

variable 

s excluding the 

constat 

it term. 









s an extension of the original Durbin-Wai 

tsontabl 

e and is 


;ed from N. E. S 


K. J. White, “T1 

le Durbi 


Correlat 


with Extreme Small Samples or Mai 

ay Regressors,” 1 

Econometrica, vol. 45, N 


r 1977, pp. 1989-96 and 



r, Econo. 




September 

1980, p. 

1554. Reprinted by permission c 

if the Econometric Society. 











EXAMPLE 1 

If A7 = 

= 40 and k' 

= 4 ,di- 

1.285 and du = 

1.72' 

1. If a computed d value is less than 1.285, 




there is evidence of positive first-order serial correlation; if it is greater than 1.721 

, there is 




no evidence of positive first-order serial correlation; but if d lies between the lower and the 




upper limit, there is inconclusive evidence regarding the presence or absence of positive 




first-order serial correlation. 
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Id = number of explanatory variables excluding the constant term. 
Source: Savin and White, op. cit., by permission of the Econometric Society. 
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TABLE D.6A Critical Values of Runs in the Runs Test 


Ni 










A/2 










2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 

2 











2 

2 

2 

2 

2 

2 

2 

2 

2 

3 





2 

2 

2 

2 

2 

2 

2 

2 

2 

3 

3 

3 

3 

3 

3 

4 




2 

2 

2 

3 

3 

3 

3 

3 

3 

3 

3 

4 

4 

4 

4 

4 

5 



2 

2 

3 

3 

3 

3 

3 

4 

4 

4 

4 

4 

4 

4 

5 

5 

5 

6 


2 

2 

3 

3 

3 

3 

4 

4 

4 

4 

5 

5 

5 

5 

5 

5 

6 

6 

7 


2 

2 

3 

3 

3 

4 

4 

5 

5 

5 

5 

5 

6 

6 

6 

6 

6 

6 

8 


2 

3 

3 

3 

4 

4 

5 

5 

5 

6 

6 

6 

6 

6 

7 

7 

7 

7 

9 


2 

3 

3 

4 

4 

5 

5 

5 

6 

6 

6 

7 

7 

7 

7 

8 

8 

8 

10 


2 

3 

3 

4 

5 

5 

5 

6 

6 

7 

7 

7 

7 

8 

8 

8 

8 

9 

11 


2 

3 

4 

4 

5 

5 

6 

6 

7 

7 

7 

8 

8 

8 

9 

9 

9 

9 

12 

2 

2 

3 

4 

4 

5 

6 

6 

7 

7 

7 

8 

8 

8 

9 

9 

9 

10 

10 

13 

2 

2 

3 

4 

5 

5 

6 

6 

7 

7 

8 

8 

9 

9 

9 

10 

10 

10 

10 

14 

2 

2 

3 

4 

5 

5 

6 

7 

7 

8 

8 

9 

9 

9 

10 

10 

10 

11 

11 

15 

2 

3 

3 

4 

5 

6 

6 

7 

7 

8 

8 

9 

9 

10 

10 

11 

11 

11 

12 

16 

2 

3 

4 

4 

5 

6 

6 

7 

8 

8 

9 

9 

10 

10 

11 

11 

11 

12 

12 

17 

2 

3 

4 

4 

5 

6 

7 

7 

8 

9 

9 

10 

10 

11 

11 

11 

12 

12 

13 

18 

2 

3 

4 

5 

5 

6 

7 

8 

8 

9 

9 

10 

10 

11 

11 

12 

12 

13 

13 

19 

2 

3 

4 

5 

6 

6 

7 

8 

8 

9 

10 

10 

11 

11 

12 

12 

13 

13 

13 

20 

2 

3 

4 

5 

6 

6 

7 

8 

9 

9 

10 

10 

11 

12 

12 

13 

13 

13 

14 


Note: Tables D.6A and D.6B give the critical values of runs n for various values of N\ (+ symbol) and N2 (— symbol). For the one-sample runs test, any value 
of n that is equal to or smaller than that shown in Table D.6A or equal to or larger than that shown in Table D.6B is significant at the 0.05 level. 

Source: Sidney Siegel, Nonparametric Statistics for the Behavioral Sciences, McGraw-Hill Book Company, New York, 1956, table F, pp. 252-253. The tables have been 
adapted by Siegel from the original source: Frieda S. Swed and C. Eisenhart, “Tables for Testing Randomness of Grouping in a Sequence of Alternatives,” Annals of 
Mathematical Statistics, vol. 14, 1943. Used by permission of McGraw-Hill Book Company and Annals of Mathematical Statistics. 


TABLE D.6B Critical Values of Runs in the Runs Test 


n 2 

Ni 2 

3 4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 

2 

3 

4 


9 

9 















5 

9 

10 

10 

11 

11 













6 

9 

10 

11 

12 

12 

13 

13 

13 

13 









7 


11 

12 

13 

13 

14 

14 

14 

14 

15 

15 

15 






8 


11 

12 

13 

14 

14 

15 

15 

16 

16 

16 

16 

17 

17 

17 

17 

17 

9 



13 

14 

14 

15 

16 

16 

16 

17 

17 

18 

18 

18 

18 

18 

18 

10 



13 

14 

15 

16 

16 

17 

17 

18 

18 

18 

19 

19 

19 

20 

20 

11 



13 

14 

15 

16 

17 

17 

18 

19 

19 

19 

20 

20 

20 

21 

21 

12 



13 

14 

16 

16 

17 

18 

19 

19 

20 

20 

21 

21 

21 

22 

22 

13 




15 

16 

17 

18 

19 

19 

20 

20 

21 

21 

22 

22 

23 

23 

14 




15 

16 

17 

18 

19 

20 

20 

21 

22 

22 

23 

23 

23 

24 

15 




15 

16 

18 

18 

19 

20 

21 

22 

22 

23 

23 

24 

24 

25 

16 





17 

18 

19 

20 

21 

21 

22 

23 

23 

24 

25 

25 

25 

17 





17 

18 

19 

20 

21 

22 

23 

23 

24 

25 

25 

26 

26 

18 





17 

18 

19 

20 

21 

22 

23 

24 

25 

25 

26 

26 

27 

19 





17 

18 

20 

21 

22 

23 

23 

24 

25 

26 

26 

27 

27 

20 





17 

18 

20 

21 

22 

23 

24 

25 

25 

26 

27 

27 

28 
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EXAMPLE 2 In a sequence of 30 observations consisting of 20 + signs (= N \) and 10 — signs (= N 2 ), 

the critical values of runs at the 0.05 level of significance are 9 and 20, as shown by 
Tables D.6A and D.6B, respectively. Therefore, if in an application it is found that the 
number of runs is equal to or less than 9 or equal to or greater than 20, one can reject 
(at the 0.05 level of significance) the hypothesis that the observed sequence is random. 


TABLE D.7 1% and 5% Critical Dickey-Fuller t (= r) and F Values for Unit Root Tests 


Sample 

Size 

tn 


t 


tct 


ft 



F* 

1% 

5% 

1% 

5% 

1% 

5% 

1% 

5% 

1% 

5% 

25 

-2.66 

-1.95 

-3.75 

-3.00 

-4.38 

-3.60 

10.61 

7.24 

8.21 

5.68 

50 

-2.62 

-1.95 

-3.58 

-2.93 

-4.15 

-3.50 

9.31 

6.73 

7.02 

5.13 

100 

-2.60 

-1.95 

-3.51 

-2.89 

-4.04 

-3.45 

8.73 

6.49 

6.50 

4.88 

250 

-2.58 

-1.95 

-3.46 

-2.88 

-3.99 

-3.43 

8.43 

6.34 

6.22 

4.75 

500 

-2.58 

-1.95 

-3.44 

-2.87 

-3.98 

-3.42 

8.34 

6.30 

6.15 

4.71 

00 

-2.58 

-1.95 

-3.43 

-2.86 

-3.96 

-3.41 

8.27 

6.25 

6.09 

4.68 

‘Subscripts nc, 

c, and ct den. 

Dte, respectivi 

;ly, that there is no 

.constant, a con 

stant, and a consta 

nt and trend tt 

;rm in the regresi 

sionEq. (21.9.5). 



t- The critical F values are for the joint hypothesis that the constant and S terms in Eq. (21.9.5) are simultaneously equal to zero. 

The critical F values are for the joint hypothesis that the constant, trend, and S terms in Eq. (21.9.5) are simultaneously equal to zero. 

Source: Adapted from W. A. Fuller, Introduction to Statistical Time Series, John Wiley & Sons, New York, 1976, p. 373 (for the r test), and D. A. Dickey and W. A. Fuller, 













Appendix 


Computer Output 
of EViews, MINITAB, 
Excel, and STATA 


In this appendix we show the computer output of EViews, MINITAB, Excel, and STATA, 
which are some of the popularly used statistical packages for regression and related statis¬ 
tical routines. We use the data given in Table E. 1 from the textbook website to illustrate the 
output of these packages. Table E. 1 gives data on the civilian labor force participation rate 
(CLFPR), the civilian unemployment rate (CUNR), and real average hourly earnings in 
1982 dollars (AHE82) for the U.S. economy for the period 1980 to 2002. 

Although in many respects the basic regression output is similar in all these packages, 
there are differences in how they present their results. Some packages give results to sev¬ 
eral digits, whereas some others approximate them to four or five digits. Some packages 
give analysis of variance (ANOVA) tables directly, whereas for some other packages they 
need to be derived. There are also differences in some of the summary statistics presented 
by the various packages. It is beyond the scope of this appendix to enumerate all the dif¬ 
ferences in these statistical packages. You can consult the websites of these packages for 
further information. 


E. 1 EViews 


Using Version 6 of EViews, we regressed CLFPR on CUNR and AHE82 and obtained the 
results shown in Figure E. 1. 

This is the standard format in which EViews results are presented. The first part of this 
figure gives the regression coefficients, their estimated standard errors, the t values under 
the null hypothesis that the corresponding population values of these coefficients are 
zero, and the p values of these t values. This is followed by R 2 and adjusted R 2 . The other 
summary output in the first part relates to the standard error of the regression, residual 
sum of squares (RSS), and the F value to test the hypothesis that the (true) values of all 
the slope coefficients are simultaneously equal to zero. Akaike information and Schwartz 
criteria are often used to choose between competing models. The lower the value of these 
criteria, the better the model is. The method of maximum likelihood (ML) is an alterna¬ 
tive to the method of least squares. Just as in OLS we find those estimators that minimize 
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FIGURE E.1 

EViews output of 
civilian labor force 
participation 
regression. 


Dependent Variable: CLFPR 
Method: Least Squares 
Sample: 1980-2002 
Included observations: 23 


Variable Coefficient Std. Error f-Statistic Prob. 


C 80.90133 4.756195 17.00967 0.0000 

CUNR -0.671348 0.082720 -8.115928 0.0000 

AHE82 -1.404244 0.608615 -2.307278 0.0319 


R-squared 
Adjusted R-squared 
S.E. of regression 
Sum squared resid 
Log likelihood 
Durbin-Watson stat 


0.772765 Mean dependent var 

0.750042 S.D. dependent var 

0.584308 Akaike info criterion 

6.828312 Schwarz criterion 

-18.66979 F-statistic 

0.787625 Prob(F-statistic) 


65.89565 

1.168713 

1.884330 

2.032438 

34.00731 

0.000000 


Obs Actual Fitted Residual 


Residual Plot 


1980 63.8000 

1981 63.9000 

1983 64.0000 

1984 64.4000 

1985 64.8000 

1986 65.3000 

1987 65.6000 

1988 65.9000 

1989 66.5000 

1990 66.5000 

1991 66.2000 

1992 66.4000 

1993 66.3000 

1994 66.6000 

1995 66.6000 

1996 66.8000 

1997 67.1000 

1998 67.1000 

1999 67.1000 

2000 67.2000 

2001 56.9000 

2002 66.6000 


65.2097 

65.0004 

63.6047 

63.5173 

64.9131 

65.1566 

65.2347 

65.8842 

66.4103 

66.6148 

66.5819 

65.8745 

65.4608 

65.8917 

66.4147 

66.7644 

66.8425 

67.0097 

66.9974 

67.0443 

67.1364 

66.4589 

65.5770 


-1.40974 

-1.10044 

0.39535 

0.48268 

-0.51311 

-0.35664 

0.06526 

-0.28416 

-0.51027 

-0.11476 

-0.08186 

0.32546 

0.93923 

0.40834 

0.18530 


0.09032 

0.10263 

0.05569 

0.06355 

0.44105 

1.02304 




Series: Residuals 
Sample 1980-2002 
Observations 23 
Mean —1.39e-14 
Median 0.063552 
Maximum 1.023040 
Minimum -1.409735 
Std. Dev. 0.557116 
Skewness -0.593013 
Kurtosis 3.752631 


Jarque-Bera 

Probability 
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the error sum of squares, in ML we try to find those estimators that maximize the possi¬ 
bility of observing the sample at hand. Under the normality assumption of the error term, 
OLS and ML give identical estimates of the regression coefficients. The Durbin-Watson 
statistic is used to find out if there is first-order serial correlation in the error terms. 

The second part of the EViews output gives the actual and fitted values of the dependent 
variable and the difference between the two, which represent the residuals. These residuals 
are plotted alongside this output with a vertical line denoting zero. Points to the right of the 
vertical line are positive residuals and those to the left represent negative residuals. 

The third part of the output gives the histogram of the residuals along with their sum¬ 
mary statistics. It gives the Jarque-Bera (JB) statistic to test for the normality of the error 
terms and also gives the probability of obtaining the stated statistics. The higher the prob¬ 
ability of obtaining the observed JB statistic, the greater is the evidence in favor of the null 
hypothesis that the error terms are normally distributed. 

Note that EViews does not give directly the analysis-of-variance (ANOVA) table, but it 
can be constructed easily from the data on the residual sum of squares, the total sum of 
squares (which will have to be derived from the standard deviation of the dependent 
variable), and their associated degrees of freedom. The F value given from this exercise 
should be equal to the F value reported in the first part of the table. 

E.2 MINITAB 


Using Version 15 of MINITAB, and using the same data, we obtained the regression results 
shown in Figure E.2. 

MINITAB first reports the estimated multiple regression. This is followed by a list of 
predictor (i.e., explanatory) variables, the estimated regression coefficients, their standard 
errors, the T (= t) values, and the p values. In this output S represents the standard error of 
the estimate, and R 2 and adjusted R 2 values are given in percent form. 

This is followed by the usual ANOVA table. One characteristic feature of the ANOVA 
table is that it breaks down the regression, or explained, sum of squares among predictors. 
Thus of the total regression, sum of squares of 23.226, the share of CUNR is 21.404 
and that of AHE82 is 1.822, suggesting that relatively, CUNR has more impact on CLFPR 
than AHE82. 

A unique feature of the MINITAB regression output is that it reports “unusual” obser¬ 
vations; that is, observations that are somehow different from the rest of the observations in 
the sample. We have a hint of this in the residual graph given in the EViews output, for it 
shows that the observations 1 and 23 are substantially away from the zero line shown there. 
MINITAB also produces a residual graph similar to the EViews residual graph. The 
St Resid in this output is the standardized residuals; that is, residuals divided by S, the 
standard error of the estimate. 

Like EViews, MINITAB also reports the Durbin-Watson statistic and gives the his¬ 
togram of residuals. The histogram is a visual picture. If its shape resembles the normal dis¬ 
tribution, the residuals are perhaps normally distributed. The normal probability plot 
accomplishes the same purpose. If the estimated residuals lie approximately on a straight 
line, we can say that they are normally distributed. The Anderson-Darling (AD) statistic, 
an adjunct of the normal probability plot, tests the hypothesis that the variable under con¬ 
sideration (here residuals) is normally distributed. If the p value of the calculated AD sta¬ 
tistic is reasonably high, say in excess of 0.10, we can conclude that the variable is normally 
distributed. In our example the AD statistic has a value of 0.481 with a p value of about 
0.21 or 21 percent. So we can conclude that the residuals obtained from the regression 
model are normally distributed. 
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FIGURE E.2 MENITAB output of civilian labor force participation rate. 

Regression Analysis: CLFPR versus CUNR, AHE82 

The regression equation is 

CLFPR = 81.0 - 0.672 CUNR - 1.41 AHE82 


Predictor 

Constant 

CUNR 

AHE82 


Coef SE Coef 

80.951 4.770 

-0.67163 0.08270 

-1.4104 0.6103 


16.97 0.000 

-8.12 0.000 

-2.31 0.032 


S = 0.584117 R-Sq = 77.3% 
Analysis of Variance 


Source DF 

Regression 2 

Residual Error 20 

Total 22 

Source DF Seq SS 
CUNR 1 21.404 

AHE82 1 1.822 


R-Sq(adj) = 75.0% 


SS MS F 

23.226 11.613 34.04 

6.824 0.341 

30.050 


Unusual Observations 
Obs CUNR CLFPR 

1 7.10 63.800 

23 5.80 66.600 


Fit SE Fit Residual 

65.209 0.155 -1.409 

65.575 0.307 1.025 


R denotes an observation with a large standardized residual. 


Durbin-Watson statistic = 0.787065 


St Resid 
-2.50R 
2.06 R 



E.3 Excel 


Using Microsoft Excel we obtained the regression output shown in Table E.2. 

Excel first presents summary statistics, such as R 2 , multiple R, which is the (positive) 
square root of R, adjusted R 2 , and the standard error of the estimate. Then it presents the 
ANOVA table. After that it presents the estimated coefficients, their standard errors, the t 
values of the estimated coefficients and their p values. It also gives the actual and estimated 
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TABLE E.2 
Excel Output of 
Civilian Labor Force 
Participation Rate 


Summary Output 
Regression Statistics 


0.879155 

0.772914 

0.750205 

0.584117 

23 


Multiple R 
R Square 
Adjusted R 
Standard E 
Observation 


Regression 2 23.22572 

Residual 20 6.823846 

Total 22 30.04957 


11.61286 

0.341192 


F Significance F 

34.03611 3.65E-07 



Coefficient 

Standard Err 

t Stat 

p-value 

Lower 95% Upper 95% 

Intercept 

80.95122 

4.770337 

16.96971 

2.42E-1 3 

71.00047 90.90196 

CUNR 

-0.671631 

0.082705 

-8.120845 

9.24E-08 

-0.84415 -0.499112 

AHE82 

-1.410432 

0.610348 

-2.310867 

0.031626 

-2.683594 -0.13727 


values of the dependent variable and the residual graph as well as the normal probability 
plot. 

A unique feature of Excel is that it gives the 95 percent (or any specified percent) confi¬ 
dence interval for the true values of the estimated coefficients. Thus, the estimated value of 
the coefficient of CUNR is —0.671631 and the confidence interval for the true value 
of CUNR coefficient is (—0.84415 to —0.499112). This information is very valuable for 
hypothesis testing. 


E.4 STATA 


Using STATA we obtained the regression results shown in Table E.3. 

Stata first presents the analysis of variance table along with the summary statistics such 
as R 2 , adjusted R 2 , and the root mean-squared-error (MSE), which is just the standard error 
of the regression. 

Then it gives the values of the estimated coefficients, their standard errors, their t values, 
the p values of the t statistics, and the 95 percent confidence interval for each of the re¬ 
gression coefficients, which is similar to the Excel output. 

E.5 Concluding Comments 

We have given just the basic output of these packages for our example. But it may be noted 
that packages such as EViews and STATA are very comprehensive and contain many of the 
econometric techniques discussed in this text. Once you know how to access these pack¬ 
ages, running various subroutines is a matter of practice. If you wish to pursue economet¬ 
rics further, you may want to buy one or more of these packages. 
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TABLE E.3 
STATA Output of 
Civilian Labor Force 
Participation Rate 


References 


Statistics/Data Analysis 
Project: Data of Table E.1 


_/ / /_/ / /_/ 8.0 Copyright 1984-2003 

Statistics/Data Analysis Stata Corporation 

4905 Lakeway Drive 
College Station, Texas 77845 USA 
800-STATA-PC http://www.stata.com 
979-696-4600 stata@stata.com 
979-696-4601 (fax) 

gress clfpr cunr ahe82 


Source 


SS 


df 


MS 


Model 23.2256929 
Residual 6.82384072 
Total 30.0495337 


11.6128465 

.341192036 

1.36588789 


Number of obs = 23 

F(2, 20) = 34.04 

Prob > F = 0.0000 
R-squared = 0.7729 
Adj R-squared = 0.7502 
Root MSE =.58412 


clfpr 

Coef. 

Std. Err. 

t 

p > Ifl 

[95% Conf. Interval] 

cunr 

-.6716305 

.0827045 

-8.12 

0.000 

-.8441491 

-.4991119 

ahe82 

-1.410433 

.6103473 

-2.31 

0.032 

-2.683595 

-.1372707 

_cons 

80.95122 

4.770334 

16.97 

0.000 

71.00048 

90.90197 


www. eviews. com 
www.stata.com 
www.minitab.com 
Microsoft Excel 

R. Carter Hill, William E. Griffiths, George G. Judge, Using Excel for Undergraduate 
Econometrics, John Wiley & Sons, New York, 2001. 













Appendix 


Economic Data 
on the World Wide 
Web* 


Economic Statistics Briefing Room: An excellent source of data on output, income, 
employment, unemployment, earnings, production and business activity, prices and 
money, credits and security markets, and international statistics. 

http://www.whitehouse.gov/fsbr/esbr.html 

Federal Reserve System Beige Book: Gives a summary of current economic 
conditions by Federal Reserve District. There are 12 Federal Reserve Districts. 

http://www.federalreserve.gov/FOMC/BEIGEBOOK 

National Bureau of Economic Research (NBER) Home Page: This highly regarded 
private economic research institute has extensive data on asset prices, labor, 
productivity, money supply, business cycle indicators, etc. NBER has many links to 
other Web sites. 

http://www.nber.org 

Panel Study: Provides data on longitudinal survey of representative sample of U.S. 
individuals and families. These data have been collected annually since 1968. 

http://psidonline.isr.umich.edu/ 

Resources for Economists on the Internet: Very comprehensive source of informa¬ 
tion and data on many economic activities with links to many Web sites. A very 
valuable source for academic and nonacademic economists. 

http://rfe.org/ 

American Stock Exchange: Information on some 700 companies listed on the second 

largest stock market. 

http://www.amex.com/ 

Bureau of Economic Analysis (BEA) Home Page: This agency of the U.S. Depart¬ 
ment of Commerce, which publishes the Survey of Current Business, is an excellent 
source of data on all kinds of economic activities, 
http ://www.bea.gov/ 

‘Adapted from Annual Editions: Microeconomics 98/99, ed. Don Cole, Dushkin/McGraw-Hill, 
Connecticut, 1998. It should be noted that this list is by no means exhaustive. The sources listed here 
are updated continually. 
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CIA Publications: This source includes the World Fact Book (annual) and Handbook 
of International Statistics. 

http://www.cia.gov/library/publications 

Energy Information Administration (DOE): Economic information and data on each 

fuel category. 

http ://www.eia.doe.gov/ 

FRED Database: Federal Reserve Bank of St. Louis publishes historical economic 
and social data, which include interest rates, monetary and business indicators, 
exchange rates, etc. 

http://research.stlouisfed.org/fred2/ 

International Trade Administration: Offers many Web links to trade statistics, cross¬ 
country programs, etc. 

http ://trade.gov/index.asp 

STAT-USA Databases: The National Trade Data Bank provides the most comprehen¬ 
sive source of international trade data and export promotion information. There is also 
extensive data on demographic, political, and socioeconomic conditions for several 
countries. 

http ://www.stat-usa.gov/ 

Statistical Resources on the Web/Economics: An excellent source of statistics 
collated from various federal bureaus, economic indicators, the Federal Reserve Board, 
data on consumer price, and Web links to other sources. 

http://www.lib.umich.edu/govdocs/stats.html 

Bureau of Labor Statistics: The home page provides data related to various aspects 
of employment, unemployment, and earnings, as well as links to other statistical Web 
sites. 

http ://www.stats.bls.gov/ 

U.S. Census Bureau Home Page: Prime source of social, demographic, and 
economic data on income, employment, income distribution, and poverty, 
http ://www.census.gov/ 

General Social Survey: Annual personal interview survey data on U.S. households 
that began in 1972. More than 35,000 have responded to some 2,500 different 
questions covering a variety of data. 
http://www.norc.org/GSS-i-website/ 

Institute for Research on Poverty: Data collected by nonpartisan and nonprofit 
university-based research center on a variety of questions relating to poverty and social 
inequality. 

http://www.irp.wisc.edu/ 

Social Security Administration: The official Web site of the Social Security Adminis¬ 
tration with a variety of data, 
http ://www.ssa.gov/ 
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Granger representation theorem, 764 
Graphical analysis, 749 
Gravity, law of, 19 

Gross domestic product (GDP), 5-7, 91, 
738, 739 

Gross national product (GNP), 2 
Grouped data, 556-561, 567-570 
Grouped logit (glogit) model, 558-561 
Grouped probit (gprobit) model, 567-570 
Growth rate, instantaneous vs. 
compound, 164 

Growth rate formulas, 186-187 
Growth rate measurement, 162-164 


H 

if statistic, 465, 637 
HAC standard errors 

(see Heteroscedasticity- and 
autocorrelation-consistent 
standard errors) 

Hamburger standard, 140 

Handbook of International Statistics, 901 

Hat ( A ), 5n 

Hausman test, 603, 683, 703-704 
Hazard rate, 575 
Heterogeneity, 594 
Heterogeneity effect, 595 
Heterogeneity problem, 23 
Heteroscedastic variances, 544-545 
Heteroscedasticity, 365-401 
and autocorrelation, 450 
defined, 65 

detection of, 376-389 
Breusch-Pagan-Godffey test, 
385-386 

formal methods, 378 
Glejser test, 379-380 
Goldfeld-Quandt test, 382-384 
graphical method, 377-378 
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Heteroscedasticity ( Cont .) 

informal methods, 376-378 
Koenker-Basset test, 388-389 
nature of problem, 376-377 
Park test, 378-379 
selection of test, 389 
Spearman’s rank correlation test, 
380-382 

White’s general test, 386-388 
and dummy variables, 298-299 
examples of, 395-399 
GLS method of correcting for, 371-374 
nature of, 365-370 

OLS estimation in presence of, 370-371, 
374-376 

overreacting to, 400 
patterns of, 391-395 
remedial measures for, 389-395 
assumptions about pattern of 
heteroscedasticity, 391-395 
White’s heteroscedasticity-consistent 
variances/standard errors, 391 
WLS, 389-390 

White’s standard errors corrected for, 411 
Heteroscedasticity- and autocorrelation- 
consistent (HAC) standard errors, 
447-448 

Heteroscedasticity-consistent covariance 
matrix estimators, 391n 
Higher moments of probability 
distributions, 815-816 
Histogram of residuals, 130-131 
Historical regression, 126 
Holt’s linear method, 774 
Holt-Winters’ method, 774 
Homoscedasticity (assumption 4), 

64-66, 365 

Hypothesis statement, 3 
Hypothesis testing, 113-124, 831-837 
about individual regression coefficients 
in matrix notation, 859-860 
accepting or rejecting hypothesis, 119 
choosing approach to, 124 
choosing level of significance, 121-122 
in classical theory of statistical 
inference, 97 

confidence-interval approach, 831-836 
confidence-interval approach to, 113-115 
as econometric modeling step, 7-8 
exact level of significance, 122-123 
forming null/alternative hypotheses, 121 
in multiple regression, 234-237, 
259-260 

statistical vs. practical significance, 
123-124 

test-of-significance approach, 115-119, 
836-837 

zero null hypothesis and 2 -t rule of 
thumb, 120 


i (subscript), 21 
Identification: 

in BJ methodology, 778-782 
order condition, 699-700 
rank condition, 700-703 
rules for, 699-703 

Identification problem, 671-672, 689-703 
defined, 692 

exact identification, 694-697 
notations/definitions used in, 689-692 
overidentification, 697-698 
underidentification, 692-694 
Identity matrix, 840 
Idiosyncratic term, 603 
“Ignorable case,” 499, 500 
ILS {see Indirect least squares) 

Impact multipliers, 619, 691 
Impulse response function (IRF), 789 
Impulses, 785 
Imputing values, 499 
Inclusion, ot irrelevant variables, 469, 
473^174, 520-521 
Income multiplier (M), 8 
Incremental contribution of explanatory 
variable, 243-246 
Independent variable, 3 
Indifference curves, 28 
Indirect least squares (ILS), 691, 
715-718,735 

Individual prediction, 128-129, 146, 862 
Individual probability density function, 805 
Individual regression coefficients, 235-237 
Individual-level data, 556, 561-566, 
570-571, 589-590 
Inertia, 414 

Infinite (lag) model, 623 
Influential point, 497 
Innovations, 785 
In-sample forecasting, 491 
Instantaneous rate of growth, 164 
Institute for Research on Poverty, 901 
Institutions, 622 
Instrument validity, 669-670 
Instrumental variables, 485, 718 
Instrumental variables (IV) method, 
636-637 

Integrated of order 1, 746 
Integrated of order 2, 746 
Integrated of order d, 747 
Integrated processes, 746-747 
Integrated stochastic processes, 746-747 
Integrated time series, 747 
Interaction among regressors, 470 
Interaction dummy, 289-290 
Interaction term, 263, 549 
Interactive form, 287 
Intercept, 3 


Intercept coefficient, 37 
Intercorrelation, measurement of, 321n 
Interest rates: 

and Federal Reserve, 642-643 
and investments/sales, 666 
and money, 655-656 
and money/GDP/CPI, 709 
Internal Revenue Service (IRS), 27 
International Trade Administration, 901 
Internet, 25 
Interpolation, 417 

Interval estimation, 108-112, 824-825 
confidence interval for 0% 111-112 
confidence intervals for regression 
coefficients p\ and P2, 109-111 
defined, 108 

Interval estimators, 59, 108 
Interval scale, 28 

Intrinsically nonlinear regression models, 
525-526 

Invariance property, 830 

Inverse Mills ratio, 575 

Inverse of square matrix, 847 

Inversion, matrix, 843 

Inverted V distributed-lag model, 664 

Investment data, 25,26 

IRF (impulse response function), 789 

Irrelevant variables: 

inclusion of, 469, 473-474 
tests for, 475—476 

and unbiasedness property, 520-521 
IRS (Internal Revenue Service), 27 
IS model of macroeconomics, 677-678 
Iterated expectations, law of, 815 
Iterative linearization method, 530 
Iterative methods, 446^147 
Iterative process, 529 
IV method {see Instrumental 
variables method) 


/curve of international economics, 621 
/test, 490-492 

Jarque-Bera (JB) test, 131,132, 819 
Joint confidence interval, 111 
Joint probability density functions, 805 
Just identification {see Exact identification) 


K 

K normal equations, 874 
KB test {see Koenker-Basset test) 
Keynesian consumption function, 3-5, 7 
Keynesian model of income determination, 
675-676 

KISS principle, 511 
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Klein’s model I, 679, 725-726 
Klien’s rule of thumb, 339 
Knot (known in advance threshold), 296 
Koenker-Basset (KB) test, 388-389 
Koyck model, 624-629 
and adaptive expectations model, 
629-631 

combining adaptive expectations and 
partial adjustment models, 634 
example using, 627-629, 631 
mean lag in, 627 
median lag in, 627 

and partial adjustment model, 632-633 
Koyck transformation, 626 
Kruskal’s theorem, 376n, 422 
Kurtosis, 131, 132,815,816 
K- variable linear regression model, 849-851 


Labor economics, 17, 18 
Labor force participation (LFP), 51, 541, 
549-551, 872 

Lag(s): 

and autocorrelation, 416—417 
in economics, 618-622 
length of, 753 
reasons for, 622-623 
Lag operator, 744n 
Lagged endogenous variables, 690 
Lagged values, 417 
Lagrange multiplier (LM) model, 678 
Lagrange multiplier (LM) test, 259-260, 
481-482 (See also Breusch- 
Godfrey test) 

Lag-weighted average of time, 627 
Large sample theory, 510 
Large-sample properties, 96, 828-831 
Latent variable, 566, 603 
Law of gravity, 19 
Law of iterated expectations, 815 
Law of universal regression, 15 
LB (Ljung-Box) statistic, 754 
Lead terms, 667 

Leamer-Schwarz critical values, 836 
Least-squares criterion, 56 
Least-squares dummy variable (LSDV) 
model, 596-599 
Least-squares estimates: 
derivation of, 92 

precision/standard errors of, 69-71 
two-stage (see Two-stage least squares) 
Least-squares estimator(s), 59 
consistency of, 96 
linearity/unbiasedness of, 92-93 
minimum variance of, 95-96 
ordinary (see Ordinary least squares) 
properties of, 71-73 


for regression through the origin, 
182-183 
of a 2 , 93-94 

variances/standard errors of, 93 
Leptokurtic, 816 
Level form, 418 

Level of significance, 108, 824, 834 
choosing, 121-122 
exact, 122-123 

in presence of data mining, 475-476 
Leverage, 497,498 
LF (see Likelihood function) 

LFP (See Labor force participation) 

LGDP time series, 751-752 
Life-cycle permanent income hypothesis, 10 
Likelihood function (LF), 103, 590, 825 
Likelihood ratio (LR) statistic, 563 
Likelihood ratio (LR) test, 259-260, 
274-276 

Limited dependent variable regression 
models, 574 

Limited information methods, 711 
Linear equality restrictions testing, 
248-254 

F-test approach, 249-254 
(-test approach, 249 
Linear function, 38n 
Linear in parameter (assumption 1), 62 
Linear population regression function, 37 
Linear PRF, 37 

Linear probability model (LPM), 543-549 
alternatives to, 552-553 
applications of, 549-552 
defined, 543 

effect of unit change on regressor 
value in, 571 
example, 547-549 
goodness of fit, 546-547 
heteroscedastic variances of 
disturbances, 544-545 
nonfulfillment of E between 0 and 1, 545 
non-normality of disturbances, 544 
Linear regression model(s), 38, 39 
estimation of, 527 
example of, 4 
log-linear vs., 260-261 
nonlinear vs., 525-526 
Linear trend model, 164 
Linearity, 38-39 
of BLUE, 71 

of least-squares estimators, 92-93 
in parameters, 38-39 
in variables, 38 

Linearization method, 537-538 
Lin-log model, 162, 164-166 
Ljung-Box (LB) statistic, 754 
LLF (See Log-likelihood function) 

LM (Lagrange multiplier) model, 678 
LM test (see Lagrange multiplier test) 


Log hyperbola model, 172 
Logarithmic reciprocal model, 172 
Logarithms, 184-186 
Logistic distribution function, 526, 554 
Logistic growth model, 532 
Logit model, 553-555 

effect of unit change on regressor 
value in, 571 
estimation of, 555-558 
grouped, 558-561 
ML estimation, 589-590 
multinomial, 580 
ordinal, 580 
probitvs., 571-573 
ungrouped data, 561-566 
Log-likelihood function (LLF), 590, 825 
Log-lin model, 162-164 
Log-linear model, 159-162, 260-261 
Log-log model, 159 
Log-normal distribution, 174 
Long panel, 593 

Longitudinal data (see Panel data) 
Longley data, 347-350 
Long-run multiplier, 619 
Lower confidence limit, 108 
LPM (see Linear probability model) 

LR (likelihood ratio) statistic, 563 
LR test (see Likelihood ratio test) 

LSDV model (see Least-squares dummy 
variable model) 

Lucas technique, 774 
Lurking variables, 598 


M 

MA (see Moving average) 

Maintained hypothesis, 113, 475 
Mallows’s C p criterion, 488, 494-495 
Manipulation of data, 417 
Manufacturing wages and exports, 49 
Marginal contribution of explanatory 
variable, 243-246 

Marginal probability density function, 
805-806 

Marginal propensity to consume (MPC), 3, 
7, 17, 81 

Marginal propensity to save (MPS), 256 
Market Model of portfolio theory, 148, 149 
Markov first-order autoregressive 
scheme, 419 
Marquard method, 530n 
Mathematical economics, 2 
Mathematical model of consumption, 3-4 
Matrix(-ces): 
adjoint, 846 
cofactor, 846 
defined, 838 
diagonal, 839 
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Matrix(-ces) ( Cont .) 
equal, 840 
identity/unit, 840 
null, 840 
null vector, 840 
rank of, 845-846 
scalar, 840 
square, 839 
symmetric, 840 
Matrix addition, 840-841 
Matrix algebra, 838-848 
definitions, 838-839 
determinants, 843-846 
differentiation, matrix, 848 
inverse of square matrix, finding, 847 
operations, 840-843 
types of matrices, 839-840 
Matrix approach to linear regression 
model, 849-869 

ANOVA in matrix notation, 860-861 
assumptions of CLRM in matrix 
notation, 851-853 

coefficient of determination in matrix 
notation, 858 
correlation matrix, 859 
example of, 863-867 
general F testing using matrix 
notation, 861 

generalized least squares, 867-868 
hypothesis testing about individual 
regression coefficients in matrix 
notation, 859-860 
k- variable linear regression model, 
851-853 

OLS estimation, 853-858 
prediction using multiple 

regression/matrix formulation, 
861-862 

Matrix differentiation, 848 
Matrix inversion, 843 
Matrix multiplication, 841-843 
Matrix operations, 840-843 
addition, 840-841 
inversion, 843 
multiplication, 841-843 
scalar multiplication, 841 
subtraction, 841 
transposition, 843 
Matrix subtraction, 841 
Matrix transposition, 843 
Maximum likelihood (ML), 230, 556 
example of, 105 
method of, 102 

of two-variable regression model, 
103-105 

Mean prediction, 127-128, 145-146, 
861-862 

Mean reversion, 741 
Mean value, 34n 

Mean-square-error (MSE) estimator, 827-828 


Measurement, errors of, 27, 482-486 
Measurement scales, 27-28 
Mesokurtic, 816 

Method of moments (MOM), 86, 826 
Mexican economy, 532, 537 
Micronumerosity, 326, 332 
Micropanel data (see Panel data) 

Minimum variance, 95-96, 826, 827 
Minimum-variance unbiased estimators, 
100, 827 

MINITAB, 896-897 
Minor determinant, 846 
Missing data, 499-500 
ML (see Maximum livelihood) 

ML estimators, 196, 825-826 
Model (term), 3 

Model mis-specification errors, 470 
Model selection criteria, 468,493^496 
adjusted R 2 ,493 

Akaike’s information criterion, 494 
caution about criteria, 495^196 
forecast chi-square, 496 
Mallows’s C p criterion, 494-495 
R 2 criterion, 493 

Schwarz’s information criterion, 494 
Model specification bias, 467 
Model specification errors, 467 
consequences of, 470—474 
tests of, 474-482 

Durbin-Watson d statistic, 477-479 
Lagrange multiplier test for adding 
variables, 481-482 
nominal vs. true level of significance, 
475—476 

omitted variables detection, 477-482 
Ramsey’s RESET test, 479^181 
residuals examination, 477 
unnecessary variables detection, 
475—476 

types of, 468-470 
Modified d test, 437 
Modified Phillips curve, 170 
MOM (see Method of moments) 

Moment, 86 

Monetary economics, 17, 18 
Money market equilibrium, 678 
Money stock measures, 139 
Money supply function, 718 
Monte Carlo experiments, 12, 83-84, 
682-683 
Monthly data, 22 

Moving average (MA), 438, 439, 776 
MPC (see Marginal propensity to consume) 
MPS (marginal propensity to save), 256 
MSE estimator 

(see Mean-square-error estimator) 
Multicollinearity, 320-351 
assumption of no, 189 
defined, 321 
detection of, 337-341 


effects of, 347 
example, 332-337 
factors in, 323 

high but imperfect, 325-326 
Longley data example, 347-350 
nature of, 321-323 
perfect, 324-325 

practical consequences of, 327-332 
confidence intervals, 330 
micronumerosity. 332 
OLS-estimator variance, 328-330 
sensitivity to small changes in data, 
331-332 
t ratios, 330, 331 
remedial measures, 342-346 
doing nothing, 342 
rule-of-thumb procedures, 342-346 
theoretical consequences of, 326-327 
Multinomial models, 580 
Multiple coefficient of correlation, 198 
Multiple coefficient of determination, 
196-197 

Multiple regression: 

estimation problem, 188-215 
hypothesis testing 

about individual regression 
coefficients, 235-237 
forms of, 234-235 
with LR/W/LM tests, 259-260 
inference problem, 233-262 
likelihood ratio test, 274-276 
linear equality restrictions testing, 
248-254 

F-test approach, 249-254 
f-test approach, 249 
linear vs. log-linear models, 260-261 
maximum likelihood estimation, 230 
normality assumption, 233-234 
overall significance testing, 237-246 
ANOVA, 238-240 
Ftest, 238-241 
incremental contribution of 
explanatory variable, 243-246 
R 2 and F relationship, 241-242 
in terms of R 2 , 242-243 
partial correlation coefficients, 213-215 
polynomial regression models, 210-213 
prediction with, 259 
specification bias in, 200-201 
structural/parameter stability testing, 
254-259 

testing equality of two regression 
coefficients, 246-248 
three-variable model 
adjusted R 2 , 201-207 
Cobb-Douglas production function, 
207-209 

estimation of partial regression 
coefficients, 192-196 
example, 198-200 


Subject Index 917 


interpretation of regression 
equation, 191 

multiple coefficient of correlation, 198 
multiple coefficient of determination, 
196-197 

notation/assumptions, 188-190 
partial regression coefficients, 191-192 
standardized variables, regression on, 
199-200 

Multiple regression analysis, 21 
Multiple regression model, 14 
Multiple-equation model, 3 
Multiplication, matrix, 841-843 
Multiplicative effect, 470 
Multiplicative form, 287 
Mutual fund advisory feeds, 530-531 
Mutually exclusive events, 802 
MWD test, 260-261 


N 

N (number of observations), 21 
National Bureau of Economic Research 
(NBER), 900 

National Trade Data Bank, 901 
Natural logarithms, 184, 185 
Nature of X variables (assumption 7), 68 
NBER (National Bureau of Economic 
Research), 900 

N.e.d. (normal equivalent deviate), 568 
Negative correlation, 66 
Neo-classical linear regression model 
(NLRM), 63 
Nested models, 487 
Newey-West method, 441, 447-448 
Newton-Raphson iterative method, 530 
Newton’s law of gravity, 19 
NID (normally and independently 
distributed), 98 

NLLS (nonlinear least squares), 527 
NLRM (see Nonlinear regression models) 
NLRM (neo-classical linear regression 
model), 63 

No autocorrelation between disturbances 
(assumption 5), 66-67 
Nominal level of significance, 475—476 
Nominal regressand, 542 
Nominal scale, 28 
Nonexperimental data, 25,27 
Nonlinear least squares (NLLS), 527 
Nonlinear regression models (NLRM), 38, 
39, 525-535 
direct optimization, 529 
direct search method, 529 
estimation of, 527 
examples, 530-534 
iterative linearization method, 530 
linear vs., 525-526 
trial-and-error method, 527-529 


Non-nested /-’test, 488-489 
Non-nested hypotheses tests, 488-492 
Davidson-MacKinnon J test, 490-492 
discerning approach, 488-492 
discrimination approach, 488 
non-nested F test, 488^489 
Non-nested models, 487 
Non-normal error distribution, 509-510 
Non-normality, of disturbances, 544 
Nonparametric statistical methods, 758 
Nonparametric tests, 432n 

Nonsense regression, 737 
Nonsingular matrix, 844 
Nonstationary stochastic processes, 741-744 
Nonstationary time series, 741, 760-762 
Nonsystematic component, 40 
Normal distribution, 143-144, 816-819 
Normal equations, 58, 527, 875 
Normal equivalent deviate (N.e.d.), 568 
Normal probability plot (NPP), 131, 132 
Normality (assumption 10), 233-234 
for disturbances, 98 
properties of OLS estimators under, 
100-101 

reasons for using, 99-100 
of stochastic distribution, 315, 318 
Normality tests, 130-132 

histogram of residuals, 130-131 
Jarque-Bera test, 131,132 
normal probability plot, 131, 132 
Normally and independently distributed 
(NID), 98 
Normit, 568 

Normit model (see Probit model) 

Not statistically significant, 114 

NPP (see Normal probability plot) 

Nuisance parameters, 596 

Nuisance variables, 598 

Null hypothesis, 113, 120, 121, 235n, 831 

Null matrix, 840 

Null vector, 840 

Number crunching, 475 

Numerator degrees of freedom, 144 

Numerical properties, of estimators, 59 

NYSE price changes example, 794-795 


O 

Observational data: 

assumption about, 67-68 
experimental vs., 2 
quantity of, 67-68 

Odds ratio, 554 

Ohm’s law, 19 

OLS (see Ordinary least squares) 

OLS estimation, 853-858 
and autocorrelation, 418-427 


and heteroscedasticity, 370-371, 
374-376 

illustration, 855-856 
properties of OLS vector /3, 858 
variance-covariance matrix of jB, 
856-857 

OLS estimators, 192-196 
derivation of, 227-228 
inconsistency of, 679-682 
multicollinearity and variance of, 
328-330 

properties, 100-101 
properties of, 195-196 
sensitivity of, 331-332 
variances and standard errors of, 
194-195 

OLS standard-error correction, 447^148 
OLS vector, 858 

Omission, of relevant variable, 469,471-473 
Omitted category, 281 
Omitted variables, 477-482 
One-sided hypothesis, 115 
One-tail hypothesis test, 115 
One-tail test of significance, 117,118 
One-way fixed effects, 598 
Order, 838 

Order condition of identifiability, 699-700 
Ordinal models, 580 
Ordinal regressand, 542 
Ordinal scale, 28 

Ordinary least squares (OLS), 55-85 
(See also OLS estimation; OLS 
estimators) 
assumptions, 61-69 
BLUE property of, 875-876 
examples of, 81-83 
Gauss-Markov theorem, 71-73 
GLS vs., 373-374 
goodness of fit, 73-78 
method of, 55-61 

and Monte Carlo experiments, 83-84 
precision/standard errors, 69-71 
and recursive models, 712-714 
Orthogonal polynomials, 346 
Orthogonal variables, 355 
Outliers, 367,496^198 
Out-of-sample forecasting, 491 
Overall significance testing: 

ANOVA, 238-240 
Ftest, 240-241 

incremental contribution of explanatory 
variable, 243-246 
individual vs. joint, 241 
in multiple regression, 237-246 
R 2 and F relationship, 241-242 
in terms off? 2 , 242-243 
Overdifferencing, 761 
Overfitting, of model, 473—474 
Overidentification, 697-698 
Overidentified equation, 718-721 
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n (product operator), 802 
p value, 835 

Pair-wise correlations, 338 

PAM (see Partial adjustment model) 

Panel data, 23,25,26, 591 
Panel data models, 591-613 
advantages of, 592-593 
dummy variables in, 297 
estimators, properties of, 605-606 
examples of, 593-594, 607-612 
fixed effect LSDV model, 596-599 
fixed effect within-group estimator, 
599-602 

pooled OLS regression model, 594-596 
random effects model, 602-605 
selection guidelines, 606-607 
Panel Study, 900 

Panel Study of Income Dynamics 
(PSID), 591 

Panel-corrected standard errors, 606 
Parallel regressions, 285, 286 
Parameter constancy, 468 
Parameters, 3 

Park test, 378-379, 396-398 
Parsimony, 42 

Partial adjustment model (PAM), 632-634 
Partial correlation coefficients, 213-215 
Partial correlations, 338-339 
Partial regression coefficients, 189, 191-198 
PCE (see Personal consumption 
expenditure) 

PDF (see Probability density function) 

PDL (see Polynomial distributed lag) 
Percent growth rate, 160n 
Percentage change, 160n 
Percentages, logarithms and, 185-186 
Perfect collinearity, 281 
Perfect multicollinearity, 324-325 
Permanent consumption, 42 
Permanent income hypothesis, 9-10, 42, 
148, 468 

Personal computers, 82-83 
Personal consumption expenditure (PCE), 
5, 6, 738, 739 

Phenomenon of spurious regression, 
747-748 

Phillips curve, 17, 18, 169-170 
Phillips-Perron (PP) unit root tests, 758 
Piecewise linear regression, 295-297 
Pindyck-Rubinfeld model of public 
spending, 704-705 
Platykurtic, 816 
Plim (probability limit), 681 
Point estimation, 107, 823-824 
Point estimators, 4, 59, 108 
Poisson distribution, 823 
Poisson process, 542 
Poisson regression model, 576-579 


Policy purposes, model used for, 9 
Polychotomous variable, 542 
Polynomial distributed lag (PDL), 645-652 
Polynomial regression, 210-213, 346 
Polytomous dependent variable, 299 
Pooled data, 23, 591 
Pooled estimators, 606 
Pooled OLS regression model, 594-596 
Pooled regression, 256 
Population, 34, 802 
Population correlogram, 749 
Population growth, 532-533 
Population regression (PR), 37 
Population regression curve, 36 
Population regression function 
(PRF), 37-41 

Population regression line (PRL), 36, 37 
Population transformation, 534 
Positive economists, 7 
Postmultiplied, 842 
Power: 

of statistical test, 440n 
of the test, 122, 383n, 834, 835 
of unit root tests, 759 
Power curve, 835 
Power function graph, 835 
PP (Phillips-Perron) unit root tests, 758 
PPP (purchasing power parity), 139 
PR (population regression), 37 
Practical significance, statistical vs., 123-124 
Prais-Winsten transformation, 443 
Precedence, 653 
Precision, 69-71 
Predetermined variables, 690 
Prediction (See also Forecasting) 
individual, 128-129, 146, 862 
matrix formulation, 861-862 
mean, 127-128,145-146, 861-862 
with multiple regression, 259 
variance of, 862 
Predictive causality, 653 
Predictor variable, 8 
Premultiplied, 842 
Pretest bias, 206n 
Pretesting, 476 

PRF (see Population regression function) 
Price elasticity, 17 

Principal components technique, 346 
PRL (see Population regression line) 
Probability, 802-803 

Probability density function (PDF), 804-808 
conditional PDF, 806 
of continuous random variable, 804 
of discrete random variable, 803-804 
joint PDFs, 805 
marginal PDF, 805 
statistical independence, 806-808 
Probability distribution(s), 100, 101, 109 
Bernoulli binomial distribution, 822 
binomial distribution, 822-823 


chi-square distribution, 819-820 
conditional expectation and conditional 
variance, 813-815 
correlation coefficient, 812-813 
covariance, 811-812 
of disturbances, 97-98 
of estimator, 824 
expected value, 808-810 
F distribution, 821-822 
higher moments of, 815-816 
normal distribution, 816-819 
normal distribution related to, 143-144 
Poisson distribution, 823 
Student’s t distribution, 820 
variance, 810-811 
Probability element, 804 
Probability limit (plim), 681 
Probability of committing Type I error, 
108n, 121 

Probit model, 566-571 

effect of unit change on regressor 
value in, 571 

with grouped data, 567-570 
logit vs., 571-573 
ML estimation, 589-590 
multinomial, 580 
ordinal, 580 

with ungrouped data, 570-571 
Problem of estimation, 823 
Product operator (II), 802 
Productivity, 89, 607-609, 621, 667 
Proxy variables, 41—42, 485 
PSID (Panel Study of Income 
Dynamics), 591 
Psychology, 622 

Pth-order autoregressive (AR(p)), 776 
Purchasing power parity (PPP), 139 
Pure autocorrelation, 440-442 
Pure random walk, 745 
Purely random process, 741 


Q 

Q statistic, 753-754 

(Ith-order moving average (MA(^)), 776 
Quadratic function, 210 
Qualitative response models, 541-581 
duration models, 580-581 
linear probability model, 543-553 
logit model, 553-566, 589-590 
multinomial models, 580 
nature of, 541-543 
ordinal models, 580 
Poisson regression model, 576-579 
probit model, 566-571, 589-590 
selection of model, 571-573 
tobit model, 574-577 
unit change in value of regressor in, 571 
Qualitative variables, 14 
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Quality, of data, 27 
Quarterly data, 22 
Quasi-difference equation, 442 
Quinquennial data, 22 


R 

R 2 criterion, 493 

Ramsey’s RESET test, 479^481 

Random (term), 21 

Random effects estimators, 606 

Random effects model (REM), 602-607 

Random interval, 108 

Random regressor case, 510, 511 

Random sample, 823 

Random (stochastic) variable, 4, 19 

Random variables, 803 

Random walk model (RWM), 741-746 

Random walk phenomenon, 737 

Random walk time series, 751 

Randomness, 41 

Rank condition of identifiability, 700-703 

Rank of matrix, 845-846 

Rare event data, 542 

Ratio scale, 28 

Ratio transformation, 345 

Rational expectations (RE) hypothesis, 631 

Raw r 2 , 150 

RE (rational expectations) hypothesis, 631 
Real consumption function, 505-509 
Realization of possibilities, 740 
Real-time quote, 22 
Real-valued function, 802n 
Reciprocal models, 166-172 
Recursive least squares (RELS), 498 
Recursive models, 712-714 
Recursive residual test, 259 
Recursive residuals, 498 
Reduced-form coefficients, 690, 691 
Reduced-form equations, 690, 691 
Reduction, of determinant, 844 
Reduction formula, 869 
Reference category, 281 
Region of acceptance, 116 
Region of rejection, 116, 833 
Regressand, 21 
Regression: 

historical origin of term, 15 
through the origin, 147-153 
on standardized variables, 157-159 
Regression analysis, 15-21, 124-136 
and analysis of variance, 124-126 
and causation, 19-20 
and correlation, 20 
data for, 22-28 
defined, 15 
for estimation, 5 
evaluating results of, 130-134 
examples of, 16-18 


measurement scales of variables, 27-28 
prediction problem, 126-129 
reporting results of, 129-130 
statistical vs. deterministic relationships 
in, 19 

terminology/notation used in, 21 
Regression coefficients, 37, 246-248 
Regression fishing, 475 
Regression line, 16 
Regression model(s), 159 
Box-Cox, 187 

elasticity measurement, 159-162 
growth measurement, 162-166 
log-linear model, 159-162 
reciprocal models, 166-172 
selection, 172-173 
semilog models, 162-166 
and stochastic error, 174-175 
Regression software, 11-12 
“Regression to mediocrity,” 15 
Regression using standardized variables, 873 
Regressor, 21 
Rejecting hypothesis, 119 
Relative (proportional) change, 160n 
Relative frequency, 557, 802 
Relevant variable, omission of, 469, 

471-473 

RELS (recursive least squares), 498 
REM (see Random effects model) 
Repeated sampling, 84 
Replicated data, 556-558 
Reproductive property, 143 
Residual sum of squares (RSS), 70, 75 
Residuals, 44, 445-446, 477 
Resources for Economists on the 
Internet, 900 

Restricted F test, 598, 758 
Restricted least squares (RLS), 

249-252, 481 

Restricted residual sum of squares (RSSr), 
256-258 

Ridge regression, 346 

RLS (see Restricted least squares) 

Robust estimation, 318n 
Robust standard errors, 391,411 
Row by column rule of multiplication, 841 
Row vector, 839 

RSS (see Residual sum of squares) 

RSSr (see Restricted residual sum 
of squares) 

RSSur (see Unrestricted residual sum 
of squares) 

Runs test, 431—434, 892-893 
RWM (see Random walk model) 


S 

£ (summation operator), 801 

(double summation operator), 801 


Sample autocorrelation function 
(SAFC), 114, 749 
Sample correlation coefficient, 77 
Sample correlogram, 749 
Sample covariance, 749 
Sample points, 802 

Sample regression function (SRF), 42-45 

Sample regression line, 44 

Sample size, 835 

Sample space, 802 

Sample variance, 749 

Sampling, 27, 824 

Sampling distribution, 69n, 73, 109, 509 

Sargan test, 669-670 

Scalar, 838 

Scalar matrix, 840 

Scalar multiplication, 841 

Scale effect, 23 

Scale factors, 154-156 

Scaling, 154-157 

Scatter diagram (scattergram), 16 

Scatterplot, 340-341 

Schwarz’s information criterion (SIC), 

488, 494 

Seasonal analysis, 290-295 
Seasonality, 784 

Second-order autoregressive (AR(2)), 776 
Second-order moving average (MA(2)), 776 
Second-order stationary, 740 
Security market line (SML), 148 
Seemingly unrelated regression (SURE) 
model, 599n, 714n, 785n 
Self-selection bias, 499 
Semielasticity, 163 
Semilog models, 162-166 
Semilogarithmic regressions, 297-298, 314 
Serial correlation, 412^114 
Serial correlation model, 660 
Shocks, 785 
Short panel, 593 
Short-run multiplier, 619 
SIC (see Schwarz’s information criterion) 
Signed minor, 846 

Simple correlation coefficients, 213-215 
Simple hypothesis, 113, 831 
Simple regression analysis (see Two- 
variable regression analysis) 

Sims test of causality, 652n 
Simultaneity test, 703-705 
Simultaneous equations, 874 
Simultaneous-equation bias, 679-683 
Simultaneous-equation methods, 711-730 
estimation approaches, 711-712 
bias in indirect least-squares 
estimators, 735 
examples, 724-729 
indirect least squares, 715-718 
recursive models and OLS, 712-714 
standard errors of 2SLS estimators, 736 
two-stage least squares, 718-724 
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Simultaneous-equation models, 673-684 
examples of, 674-679 
nature of, 673-674 
Simultaneous-equation regression 
models, 774 

Single exponential smoothing, 774 
Single-equation methods, 712 
Single-equation model, 3 
Single-equation regression models, 13, 774 
Singular matrix, 844 
Size: 

of the statistical test, 108n 
of unit root tests, 759 
Size effect, 23 

Skewness, 131, 132, 368, 815, 816 
Slope, 3, 37 

Slope drifter (see Differential slope 
coefficients) 

Slutsky property, 830 
Small-sample properties, 826-828 
SML (security market line), 148 
Social Security Administration, 901 
Spatial autocorrelation, 412 
Spearman’s rank correlation coefficient, 86 
Spearman’s rank correlation test, 380-382 
Specification bias, 64 
assumption regarding, 189, 367 
excluded variable, 414-415 
incorrect function form, 416 
and multicollinearity, 344 
in multiple regression, 200-201 
Specification error, 64, 150 
Spline functions, 296 
Spurious correlation, 395 
Spurious regression, 737, 747-748 
Square matrix, 839, 847 
Square root transformation, 393 
SRF (see Sample regression function) 

SRM (see Switching regression models) 

St. Louis revised model, 728-729 
Stability condition, 755n 
Standard deviation, 810 
Standard error(s): 
defined, 69n 
of estimate, 70 

of least-squares estimates, 69-71 
of least-squares estimators, 93 
of OLS estimators, 194-195 
of regression, 70 
in2SLS estimators, 736 
Standard linear regression model 

(see Classical linear regression model) 
Standard normal distribution, 100 
Standardized normal distribution, 878 
Standardized normal variable, 817 
Standardized residuals, 430,431 
Standardized variables, 157-159, 183-184, 
199-200 
STATA, 898, 899 

Statement of theory or hypothesis, 3 


Stationarity, 22 
Stationarity, tests of, 748-754 
autocorrelation function/correlogram, 
749-753 

graphical analysis, 749 
statistical significance of autocorrelation 
coefficients, 753-754 
Stationary stochastic processes, 740-741 
Stationary time series, 737 
Statistic (term), 44, 823 
Statistical mdependence, 806-808 
Statistical mference, 8 
Statistical properties, 59, 69 
Statistical relationships, 19,20 
Statistical Resources on the 
Web/Economics, 901 
Statistical significance: 
of autocorrelation coefficients, 753-754 
practical vs., 123-124 
Statistical tables, 878-893 
areas under standardized normal 
distribution, 878 

critical values of runs in runs test, 
892-893 

Durbin-Watson d statistic, 888-891 
1% and 5% critical Dickey-Fuller t and 
F values for unit root tests, 893 
percentage points of t distribution, 879 
upper percentage points of / 2 
distribution, 886-887 
upper percentage points of F 
distribution, 880-885 
Statistically significant, 114 
STAT-USA databases, 901 
Steepest descent method, 529 
Stepwise backward regression, 354 
Stepwise forward regression, 354 
Stochastic (term), 19n, 21 
Stochastic disturbance, 40—42 
Stochastic error term, 40, 174-175, 
486-487 

Stochastic explanatory variables, 510-511 
Stochastic PRF, 48 
Stochastic processes, 740-744 
integrated, 746-747 
nonstationary, 741-744 
stationary, 740-741 
trend stationary/difference stationary, 
745-746 
unit-root, 744 

Stochastic regressor model, 63, 316-317 
Stochastic time series, 745 
Stochastic trend, 742, 745 
Stock adjustment model, 632 
Strictly exogenous regressors, 468 
Strictly exogenous variables, 594, 602 
Strictly white noise, 74In 
Structural breaks, 758 
Structural changes, testing for, 254-259, 
758-759 


Structural coefficients, 690 
Structural equations, 690 
Studentized residuals, 430n 
Student’s t distribution, 820 
Student’s t test, 755 
Submatrix, 839 
Subtraction, matrix, 841 
Summation operator (£), 801 
SURE model (see Seemingly unrelated 
regression model) 

Survival analysis, 580 
Switching regression models (SRM), 
296n, 300 

Symmetric matrix, 840 
Symmetric variance-covariance 
matrix, 853 

Systematic component, 40 


t (subscript), 21 

T (total number of observations), 21 
T distribution, 879 
T ratios, 330, 331,337 
x (tau) statistic, 755-757 
7” test, 115-118,249 
Target variable, 9 

Taylor’s series expansion, 530, 538 
Taylor’s theorem, 537-538 
Technology, 622 

“Ten Commandments of Applied 

Econometrics” (Peter Kennedy), 
511 

Test of significance, 115-119, 836-837 
ANOVA in matrix notation, 860-861 
X 2 test, 118-119 
confidence interval vs., 124 
overall (see Overall significance testing) 
/test, 115-118 
Test statistic, 115, 831 
Tests of non-nested hypotheses, 488-492 
Davidson-MacKinnon/test, 490-492 
discerning approach, 488-492 
discrimination approach, 488 
non-nested F test, 488^189 
Tests of specification errors, 474-482 
Texas economy application, 789-790 
TGARCH (threshold GARCH), 799 
Theoretical econometrics, 10, 11 
Theoretical probability distributions: 
Bernoulli binomial distribution, 822 
binomial distribution, 822-823 
chi-square distribution, 819-820 
F distribution, 821-822 
normal distribution, 816-819 
Poisson distribution, 823 
Student’s t distribution, 820 
Three-variable regression model: 
adjusted R 2 , 201-207 
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Cobb-Douglas production function, 
207-209 

estimation of partial regression 
coefficients, 192-198 
example, 198-200 

interpretation of regression equation, 191 
multiple coefficient of correlation, 198 
multiple coefficient of determination, 
196-197 

notation/assumptions, 188-190 
partial regression coefficients, 

191-192 

specification bias, 200-201 
standardized variables, regression on, 
199-200 

Threshold GARCH (TGARCH), 799 

Threshold level, 566 

Time derivative, 714n 

Time effect, 598 

Time sequence plot, 430 

Time series, 290 

Time series data, 737-769, 773-799 
approaches to, 773-775 
Box-Jenkins methodology, 777-784 
cointegration, 762-765 
and cross-section data, 591 
and cross-sectional data, 343 
defined, 21-23 

economic applications, 765-768 
examples of, 796-798 
key concepts with, 739 
modeling, 775-777 
spurious regression phenomenon with, 
747-748 

stationarity, tests of, 748-754 
stochastic processes, 740-747 
transforming nonstationary time series 
to, 760-762 
unit root tests, 754-760 
U.S. economy, 738-739 
vector autoregression, 784-790 
volatility measurement in, 791-796 
Time series econometrics, 22, 345 
Time-invariant variable, 595, 596 
Time-series regression, 270 
Time-to-event data analysis, 580 
Time-variant variable, 596 
Tobit model, 574-577 
Tolerance, 340 

Total sum of squares (TSS), 74 
Toxicity study, 586 
TPF (transcendental production 
function), 267 

Traditional econometric methodology, 2-3 
Transcendental production function 
(TPF), 267 

Transformation of variables, 344—345 
Transposition, 839 
Transposition, matrix, 843 
Trend stationary, 745 


Trend stationary process (TSP), 745 
Trend stationary (TS) stochastic processes, 
745-746 
Trends, 22 

Trend-stationary processes, 761-762 
Trial-and-error method, 527-529 
Triangular (arithmetic) distributed-lag 
model, 661 

Triangular models, 712, 713n 
Trichotomous variable, 542 
True level of significance, 475—476 
Truncated sample, 574n 
TS stochastic processes {see Trend 

stationary stochastic processes) 

TSP (trend stationary process), 745 
TSS (total sum of squares), 74 
2SLS {see Two-stage least squares) 

2 -t rule of thumb, 120 
Two-sided hypothesis, 113-114 
Two-stage least squares (2SLS), 

718-724, 736 

Two-tail hypothesis test, 113-114 
Two-tail test of significance, 117 
Two-variable linear regression model, 13 
Two-variable regression analysis, 21, 34-48 
examples of, 45-47 
linearity in, 38-39 

population regression function, 37-38 
sample regression function, 42-45 
stochastic disturbance in, 41—42 
stochastic specification of PRF, 39—41 
Two-variable regression model, 147-175 
elasticity measurement, 159-162 
estimation problem, 55-85 
classical linear regression model, 61-69 
coefficient of determination r 2 , 73-78 
examples, 78-83 
Gauss-Markov theorem, 71-73 
Monte Carlo experiments, 83-84 
ordinary least squares method, 55-61 
precision/standard errors, 69-71 
functional models of, 159 
log-linear model, 159-162 
reciprocal models, 166-172 
selection, 172-173 
semilog models, 162-166 
growth measurement, 162-166 
hypothesis testing, 113-124 

accepting/rejecting hypothesis, 119 
choosing level of significance, 
121-122 

confidence-interval approach, 113-115 
exact level of significance, 122-123 
forming null/altemative hypotheses, 121 
selection of method, 124 
statistical vs. practical significance, 
123-124 

test-of-significance approach, 

115-119 

zero null hypothesis/2-f rule, 120 


hypothetical example of, 34-37 
interval estimation, 107-112 
confidence intervals, 109-112 
statistical prerequisites, 107 
regression through the origin, 147-153 
and scaling/units of measurement, 
154-157 

on standardized variables, 157-159 
and stochastic error, 174-175 
Two-way fixed effects model, 598 
Type I error, 108n, 114n, 121, 122, 833, 834 
Type II error, 121, 122, 833 


U 

Unbalanced panel, 25, 593 
Unbiasedness, 520-521, 826, 827 
assumption regarding, 189, 367 
of BLUE, 72 

of least-squares estimators, 92-93 
Unconditional expected value, 35 
Underdifferencing, 761 
Underfitting, of model, 471-473 
Underidentification, 692-694 
Underprediction, 8 
Ungrouped data, 561-566, 570-571, 
589-590 

Unit change in value of regressor in, 
199-200, 571 
Unit matrix, 840 
Unit root problem, 744 
Unit root stochastic processes, 744 
Unit root tests: 

augmented Dickey-Fuller test, 

757-758 

critique, 759-760 
F test, 758 

1% and 5% critical Dickey-Fuller t and 
F values for, 893 
Phillips-Perron, 758 
structural changes testing, 758-759 
time series data, 754-760 
Units of measurement, 157 
Universal regression, law of, 15 
University of Michigan, 22 
Unobservable variable, 603 
Unobserved effect, 595 
Unrestricted residual sum of squares 
(RSSur), 257-258 
Upper confidence limit, 108 
Upward trend, 164 
U.S. Census Bureau, 22, 901 
U.S. Department of Commerce, 

23,27 

U.S. economic time series, 738-739 
U.S. i nfl ation rate, 797-798 
U.S. Treasury bills examples, 

767-768 

Utility index, 566 
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V 

Vagueness, of theory, 41 
Validity, of instruments, 669-670 
VAR model (see Vector autoregression 
model) 

Variables: 
dropping, 343-344 
measurement scales of, 27-28 
orthogonal, 355 
standardized, 183-184 
transformation of, 344-345 
Variance: 

of individual prediction, 146, 862 
of least-squares estimators, 93 
of mean prediction, 145-146, 862 
of OLS estimators, 194-195 
of probability distribution, 810-811 
variation vs., 74n 

Variance-covariance matrix, 852-853, 
856-857, 875 

Variance-inflating factor (VIF), 328, 340 
Variation, variance vs., 74n 
Vector autoregression (VAR) model, 653, 
655, 773, 775 
causality, 787-788 
estimation, 785-786 
forecasting, 786-787 
problems with, 788-789 
Texas economy application, 789-790 
time series data, 784-790 


Venn diagram, 73, 74 

VIF (see Variance-inflating factor) 

Volatility, 791 

Volatility clustering, 773 

Volatility measurement: 

ARCH presence, 795 
Durbin-Watson d and ARCH effect, 796 
in financial time series, 791-796 
GARCH model, 796 
NYSE price changes example, 794—795 
U.S./U.K. exchange rate example, 791-794 
Von Neumann ratio, 454 


W 

Wage equations, 614 
Wald test, 259-260,299n 
Weakly exogenous regressors, 468 
Weakly stationary, 740 
Weekly data, 22 
Weierstrass’ theorem, 645 
Weighted least squares (WLS), 373, 
389-390,409—410 

WG estimator (see Within-group estimator) 
White noise error, 419, 750 
White noise process, 741 
White’s general heteroscedasticity test, 
386-389,396, 398-399 
White’s heteroscedasticity-consistent 
standard errors, 391, 411, 503 


Wide sense, stochastic process, 740 
Wiener-Granger causality test, 653n 
Within-group (WG) estimator, 599-602 
WLS (see Weighted least squares) 

WLS estimators, 373 
World Fact Book, 901 
World Wide Web resources, 900-901 


X 

X (explanatory variable), 21 
assumption on nature of, 68 
independence of, 62-63, 316-317 


Y 

Y (dependent variable), 21 


Z 

Ztest, 836-837 

Zellner SURE estimation technique, 714n 
Zero contemporaneous correlation, 713 
Zero correlation, 77 
Zero mean value of ui (assumption 3), 
63-64,317 

Zero null hypothesis, 120 
Zero-intercept model, 148-150 




